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ABSTRACT 

Chen, Chih-Chien Thomas Ph.D., Purdue University, August 1988. 
Spectral Feature Design in High Dimensional Multispectral Data. Major 
Professor : David A. Landgrebe. School of Electrical Engineering. 

The High resolution Imaging Spectrometer (HIRIS) is designed to 
acquire images simultaneously in 192 spectral bands in the 0.4-2. 5 pm 
wavelength region. It will make possible the collection of essentially continuous 
reflectance spectra at a spectral resolution sufficient to extract significantly 
enhanced amounts of information from return signals as compared to existing 
systems. : By effectively utilizing these signals, direct identification of the 
parameters of species can be achieved and their subtle changes can also be 
observed and measured.. 

The advantages of such high dimensional data come at a cost of 
increased system and data complexity. For example, since the finer the 
spectral resolution, the higher the data rate, it becomes impractical to design 
the sensor to be operated continuously. Even operating HIRIS in a request only 
mode, its 512 Mbps raw data rate still constitutes a serious communication 
challenge. In order to solve this problem, it is essential to find new ways to 
preprocess the data which reduce the data rate while at the same time 
maintaining the information content of the high dimensional signal produced. 
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In this thesis, four spectral feature design techniques are developed from 
the Weighted Karhunen-Loeve Transforms. They are r non-overlapping band 
feature selection algorithm, overlapping band feature selection algorithm, 
Walsh function approach, and infinite clipped optimal function approach. From 
a simplicity and effectiveness point of view, ^the infinite clipped optimal function 
- approach is chosen since the features are easiest to find and their classification 
performance is the best. This technique approximates the spectral structure of 
the optimal features via infinite clipping and results in transform coefficients 
which are either +1, -1 or 0. Therefore the necessary processing can be easily 
implemented on-board the spacecraft by using a set of programmable adders 
that operate on the grouping instructions received from the ground station. 

After the preprocessed data has been received at the ground station, 
canonical analysis is further used to find the best set of features under the 
criterion that maximal class separability is achieved. 

In this research, both 100 dimensional vegetation data and 200 
dimensional soil data are used to test the spectral feature design system. It will 
be shown that the infinite clipped versions of the first 16 optimal features 
derived from the Weighted Karhunen-Loeve Transform have excellent 
classification performance. Further signal processing by canonical analysis 
increases the compression ratio and retains the classification accuracy. The 
overall probability of correct classification is over 90% while providing for a 
reduced downlink data rate by a factor of 10. 
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CHAPTER I 
INTRODUCTION 


1.1 Research Objective 

Due to the recent advance in optics and solid state technology, it is now 
possible to build sensors with much finer spectral resolution. This will provide 
the opportunity for collecting data for a much enriched information source. For 
example, the future High resolution Imaging Spectrometer (HIRIS) is planned to 
have as many as 192 spectral bands [1]. Since the signal dimensionality is 
tremendously increased, current techniques for analyzing multispectral data 
would not be adequate. In order to effectively utilize the information collected 
and achieve these benefits from the high dimensional measurements, it is 
essential to find new ways to process the data which reduce the data rate while 
at the same time maintaining the information content of the signals produced. 

The fundamental objective of this research is to develop an objective and 
practical spectral feature design technique for high dimensional multispectral 
data. 

One possible approach that might be used to accomplish the design 
objective is to tailor the spectral features to the particular analysis problem at 
hand. Features might be made up by grouping (i.e. summing) the narrow band 
response functions in particular spectral regions on board the spacecraft, based 
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upon the particular classes of ground cover parameters that are to be identified. 
The main advantage of this approach is the possibility of local optimality. 
Instead of finding optimal features with respect to all possible scenes (global 
optimal), a more practical and adaptive approach is introduced for each 
individual situation. The maximal attainable performance of local optimal 
features is indeed better and at least not worse, than that of global optimal ones. 
The problem then reduces to finding a means for deciding how to choose these 
band groupings effectively for each different analysis situation such that the 
data rate is greatly reduced while the classification performance is preserved or 
increased. 


1.2 Previous Approaches 

There have been basically four approaches to this feature design 
problem. They are (1) in-depth studies of physical considerations, (2) empirical 
methods, (3) simulation methods, and (4) analytical approaches. 

Important physical considerations which have been investigated are 
atmospheric effects and the interaction of light with various cover types. By 
evaluating the transmittance of the atmosphere over the spectral interval of 
interest [2,3] , one can eliminate certain portions of the interval, since little or no 
information content is contained in those regions. 

The interaction of electromagnetic radiation with plant leaves [4] , soils [5] 
and waters [6] has been studied in the past to find the most effective spectral 
features for discrimination. A typical procedure for these studies is to take 
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measurements with a spectroradiometer on restricted information classes over 
the entire spectrum. Then the average of the spectral responses is found and 
the subsequent conclusion is drawn from the average. The basic disadvantage 
of this approach is that only the mean value is considered. The potential 
information in the variance and covariance is neglected and lost. 

The second method is empirical in that a scanner with many spectral 
bands is constructed, and the selection of the bands is done experimentally. 
The studies have been done with agriculture cover types [7] , forest covers [8], 
and geological applications [9]. The main advantage of the empirical method is 
the retaining of the information in the variations about the mean. The 
correlation is considered in the feature design process. However, the spectral 
sampling is crude and incomplete for representing the whole spectrum. 

Simulation methods have been developed [10] to generate typical 
spectra according to a scene model. These artificial spectral response 
functions are then used to choose the best set of features. However, due to the 
complexity of the scene and the interrelations of various parameters [11], an 
accurate enough model of the scene is not available yet up to present. 

The recent advances in optical and solid state technologies make it 
possible to build high dimensional multispectral sensors such as H IRIS, with a 
spectral resolution of 10 nm and a spatial resolution of 30m [1]. In order to 
effectively utilize, including acquire, archive, retrieve, transmit and analyze the 
data collected, analytical feature design approaches are sought because of 
their objective and machine-oriented natures. Early works of this approach are 
found in Wiswell's and Wiersma's Ph.D dissertations. Wiswell [12] studied the 
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feasibility of using the maximal average mutual information [13] as a criterion to 
evaluate the spectral features. The best set of features are chosen so as to 
obtain the minimal reduction In uncertainly about the scene alter the 
observation is made. The research showed that average mutual information is 
a useful concept to construct the feature sets. However the relationship 
between average mutual information and global performance criterion such as 
classification accuracy was not demonstrated. Moreover, the technique was 
only applied to much lower dimensional signals (about 10); the feasibility for 
high dimensional signals in the range of one or two hundred spectral bands 
was not shown. 

Wiersma and Landgrebe [14,15] proposed the use of minimum mean 
square representation error criterion for feature design. It was shown that an 
analytical feature design procedure can be established by applying a weighted 
Karhunen-Loeve Transform [16,17,18] to the observation space in which the 
eigenvectors of the transform are the optimal (though impractically complex) 
spectral features. The dimensionality in this research was 100 which was much 
higher than that in Wiswell's work. A manual band feature selection was 
suggested according to the relative importance of spectral regions as indicated 
by the eigenfunctions. The concept of spectral dominancy was introduced 
although the final stages of the feature design process were manually 
implemented. This appears to be tedious and impractical when the number of 
cover types is greatly increased. Another drawback in Wiersma's work lies 
basically in the subjective nature of the manual feature design process. 
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1.3 Current Investigation 

The research results presented here will adopt some procedures to 
extend Wiersma's work in such a way that objective, machine implemented 
spectral feature design schemes become feasible. The idea of local optimality 
is introduced in this thesis. Instead of finding the features that are optimal with 
respect to all possible scenes (global optimal), it is now proposed to tailor the 
spectral features to the specific user problem at hand. The maximally attainable 
performance can then be increased. The new concept of structure similarity 
and its realization are discussed in this dissertation. This makes the feature 
design problem more general in the sense that overlapping features become 
practical and easily implemented. 

In this research four methods are developed which in effect lead to 
suboptimal but now practical versions of the optimal features. These derived 
spectral features were obtained by combining groups of adjacent spectral 
samples into bands, usually one or more hundred nanometers wide, that are 
specially tailored to the analysis task at hand. These features could be 
implemented by utilizing simple programmable adders at the sensor output as 
shown in Figure 1.1 
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ON-BOARD FEATURE FORMATION SYSTEM 
SCHEMATIC DIAGRAM 



N = no. of Spectral Samples collected 
= no. of Spectral Features desired 


Figure 1.1 Realization of Spectral Feature Design 
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In Figure 1.1, N is the signal dimensionality from the sensor output, and 
Nf is the number of spectral features used. The programmable adders on board 
the spacecraft act according to the received grouping instruction from the 
ground station, either adding (+1), subtracting (-1) or omitting ( 0) bands for 
each spectral function. The resulting features are then transmitted down to the 
ground station for further processing. 

The first method is based on the dominancy property of the spectral 
bands. A manually subjective selection process was used previously in 
Wiersma's work [14,15]. In this research, an objective and machine oriented 
process is developed. The spectral band edges are found by applying infinite 
clipping [21] to the average of the first few eigenvectors associated with the 
largest eigenvalues. This technique is referred to as a non-overlapping (N.O.L.) 
band feature selection algorithm due to the fact that designed features are not 
overlapping. 

The second approach utilizes a transformation from the optimal feature 
space to a new space based upon Walsh Functions (W.F.) [19,20]. These 
functions have the attractive features of being everywhere equal to either +1 or - 
1, and being ordered by the number of axis crossings. Thus the transformation 
can be implemented by either adding or subtracting bands, and the various 
functions will correspond to spectral ranges of a variety of widths. 

The third scheme applies infinite clipping (I.C.) [21] to the original optimal 
functions derived from the weighted K-L transform. The resulting features are 
the infinite clipped optimal functions. In this thesis, the experiment concludes 
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that this scheme is the most promising technique in the sense of best 
classification performance under the same compression requirement. 

The fourth approach extracts the zero crossing information from each 
optimal function and chooses those spectrum intervals that are in between two 
zero crossings as band features. Since the band features derived from each 
optimal function in this way might be linearly dependent [22], special precaution 
must be taken to get rid of linearly dependent bands. This method is called 
overlapping (O.L.) band feature selection algorithm because the bands derived 
by this scheme are overlapping. 

1.4 Preliminary Test of the On-Board Preprocessing System 

From a simplicity and effectiveness point of view, not all the four 
developed approaches are ideal for data preprocessing. Six preliminary test 
data sets are used to select the best technique. The goal is to find the most 
effective scheme under the simplicity requirement. Of the six sets of high 
spectral resolution field measurement data, three were taken over Williams 
County, North Dakota, each with 3 information classes: spring wheat, summer 
fallow and natural pasture. The other three were taken over Finney County, 
Kansas, again with 3 information classes each: winter wheat, summer fallow, 
and grain sorghum or other crops. For convenience, these data sets are 
referred to with a letter/number designator as shown in Table 1.1. 

These data were taken by the Field Spectrometer System (FSS) [23] 
mounted in a helicopter. The spectral resolution was 0.02 pm for the interval 
from 0.4 pm to 2.4 pm. 
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Table 1.1 Data Set Designation for Preliminary Test 


Location 

Date 

Designation 

#of Observ. 

Kansas 

9/28/76 

K1 

832 

Kansas 

5/03/77 

K2 

1551 

Kansas 

6/06/77 

K3 

1477 

N. Dakota 

5/08/77 

N1 

1265 

N. Dakota 

6/29/77 

N2 

1239 

N. Dakota 

8/04/77 

N3 

1 444 


For each of the six data sets, the collection of the spectral sample 
functions forms the ensemble of a random process. The mean vector and the 
covariance matrix of this ensemble are first estimated. The estimate of the 
covariance matrix is used to solve the generalized Karhunen-Loeve equation 
which results in the eigenvalues and the eigenvectors of the transform. Figure 
1.2 shows the magnitude of the first 12 eigenvectors associated with the largest 
eigenvalues for the data set K2 [15]. They will be used to explain the feature 
design schemes in chapter III. The spectral interval is 0.02 pm as stated 
previously. Therefore the dimensionality used in these preliminary tests is 100. 

From this preliminary test, it is concluded that the infinite clipped optimal 
transform is the simplest and most effective method for on-board data 
preprocessing. 




Magnitude Magnitude Magnitude 




Magnitude Magnitude Magnitude 


11 





12 


1.5 Outline of the Thesis 

In chapter 2, a theoretical review of the weighted K-L transform is given 
Two important properties, minimum mean square truncation error and 
uncorrelated transformed coefficients are proved for this generalized transform. 

Chapter 3 discusses in detail the four schemes developed to design the 
spectral features in high dimensional multispectral data. Two of them, non- 
overlapping band feature selection algorithm and overlapping band feature 
selection algorithm, are developed from the dominancy concept in 
eigenfunctions; and the other two, Walsh function approach and infinite clipped 
optimal function approach are derived from the idea of structure similarity 
between two sets of functions. Furthermore, a comparison among these data 
preprocessing schemes is included in this chapter. From the simplicity and 
effectiveness point of view, it is found that the infinite clipped optimal function 
approach is the best technique. After the preprocessed data would be received 
at the ground station, canonical analysis would be applied to the infinite 
clipped optimal transformed data to obtain maximal class separability. 

Chapter 4 shows the final results of this research. Both the vegetation 
data and the soil data are included in this chapter. The Hughes phenomenon is 
also discussed. 

Chapter 5 summaries the final conclusions and gives recommendations 
for the future work. 

An IBM 3083 Macro file used to run the spectral feature design system and the 
source code of the system are given in the appendices. 



CHAPTER II 

KARHUNEN-LOEVE TRANSFORM 


The Karhunen-Loeve (KL) expansion [44] was developed to represent 
random processes. It maps the continuous parameter random process into a 
sequence of random variables [24], The expansion generates a set of 
deterministic orthonormal basis functions. This set has a unique error- 
minimizing property and uncorrelated transformed coefficients. These 
properties make it the optimal coordinate system for many feature design 
problems. 

This transform can be generalized [25,26] to include a weighting function 
to account for certain types of a priori knowledge of the parameter set, and its 
proper use may have an important impact on the extraction of useful 
information [15]. Thus using the weighted form of K-L transform may result in 
more practical and realizable feature design. 

In the following we will show that minimum mean square truncation error 
(MMSE) and uncorrelated coefficients properties, which are directly related to 
this research, also hold for the generalized K-L transform. The MMSE property 
ensures that the eigenfunctions associated with the largest eigenvalues derived 
from the weighted K-L transform are the optimal basis functions in the sense of 
signal representation. Uncorrelated coefficients property guarantees that the 
transformed coordinates are independent under Gaussian assumption. 
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2.1 Minimum Mean Square Truncation Error 

Let X(X) be a sample function of a random process . Assume that the 
random process is continuous in probability and almost every sample function 
of the random process has finite norm in L 2 (A) space [27]. Then X(^) can be 
represented by an expansion of the form [24] 

oo 

X(X) - £ ^ (X) (2.1) 

i-1 

where the functions [Oj (X)} are deterministic and the expansion coefficients 
{ yj } are random variables. 

Define a weighting function W(X) with real finite positive values. Without 
loss of generality, the set {<X>j (X)} will be taken to be orthonormal with respect 
to W(X,). From the generalized inner product [27] which defines the metric in 
L 2 (A) space, we have 


(®., ®j) w =| ® (X) W (X) ®. (X) dX 

A 

and 


0 if i*j 

1 if i = j 


( 2 . 2 ) 


Vi - ( 0 i • x >w“J 4>i(X)W(X)X(X)dX (2.3) 

A 


If the set {Oj(A.)} is not orthonormal to begin with, it can be 
orthonormalized by the Gram-Schmidt procedure [57]. That such sets exists in 
L 2 (A) space has been demonstrated by the construction of sets such as 
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complex sinusoids, Legendre polymonials, Chebyshev polymonials, Laguerre 
functions, Walsh functions and others. 

Therefore Y = { y 1 , y 2 } is simply an orthonormal transformation of 

the random function X(?i) , and is itself a random vector. Each component of Y 
is a feature which contributes to representing the observed sample function 
X(X). 

Furthermore, we are going to choose a set (<I>j(A.)} which is complete in 
L 2 (A) space. That is, if we define the sequence 

CnW = Sy,o, W (2.4) 

i =1 


then, 



n 


4> .(X)] 2 W(X)dX } = 0 


(2.5) 


That the sequence c n (X) converges to X(A.) in the mean square sense, is 
denoted by 


X(X) = Urn. c n (k) ( 2 6) 

n->oo 

This convergence guarantees that the series can be made arbitrarily 
close to X(k) by increasing n in the expansion. 
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The problem of designing the optimal sensor then becomes that of 
selecting the set of complete orthonormal (CON) basis functions ( ^(A.) } such 

that the series representation will be optimal with respect to the minimum mean 
square error criterion. In the stochastic environment, this representation error is 
taken over the ensemble of the random process. Hence, we need : 


oo 

E { f [ X(X) - £ y, ® ,M ] 2 W(X.) dX ) = 0 (2.7) 

A i-1 

Another desirable property is that the convergence be rapid in the first 
few terms, that is, each additional term used in the series expansion decreases 
the representation error by a maximum amount . This property is called energy 
packing. 

In the real applications, however, it is impractical to transmit an infinite 
or even a very large number of channels to the ground. Therefore only a finite 
number of terms in the expansion would be used. Let n be a finite number such 
that the representation error by using the first n terms in the expansion is less 

than T, the maximal acceptable error. Then we require the selected orthonormal 
basis functions { Oj(A.) } to be complete in a finite n dimensional subspace of 

L 2 (A). That is, for any T > 0, there is an n Q such that 

n 

E{J[X(A.) - £ y. 0 ,(\)] 2 W(A.)dA.} < T ;n>n o (2.8) 

A i-1 


for any X(A.) defined in the L 2 (A) space. 
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This completeness property in finite dimensional space can guarantee 
that if we use the n dimensional subspace of L.2(A), spanned by the first n 
elements of a complete orthonormal set { <I>j(A,) }. for the representation of an 

arbitrary signal, then the norm of the error can be made arbitrarily small by 
choosing n sufficiently large. 

The objective then is to find the a finite set of orthonormal basis functions 
that have the above minimum representation error and energy packing 
properties. In the following, we are going to show that the eigenfunctions 
derived from the Weighted Karhunen-Loeve transform are just the desired 
optimal basis functions. 

In the above finite n dimensional subspace of 1.2(A), suppose only m 
terms in the expansion will be used to estimate the observed X(A.), then the 
estimate X(X) can be expressed in the following form 

m ,n 

= Xvi*, W + Zbi®, « < 2 - 9 ' 

i = 1 i - m+1 


The constants { b j } are preselected. The objective is to find the basis 
functions and the constants { b j } in such a way that the minimum mean square 
error can be obtained. 

Since we do not use all of the basis functions, the representation error 
due to truncation is then equal to 

n 

ax(X) = x{\) - k{\) =£ ( y 1 - b.)o.(X) 

i » m + 1 


( 2 . 10 ) 
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We define the weighted mean square error to be 

WMSE=E((AX,AX) w ) - E(£(y. - b.) ^(y, - b j )Jo,(A.)W(X)<l> j (A.)d>.) (2.1 1 ) 

i = m+1 ] - m+1 A 


Since the basis functions are orthonormal, Eq (2.1 1 ) reduces to 


n 

WMSE = £ E(y. -b. ) 2 (2.12) 

i* m+1 


The mean square error is minimized when 


3 E ( y. - b. ) 2 

X J \ I 1 

ab. 

f 


= - 2 E ( y . - b . ) = 0 


(2.13) 


That is, the preselected constant bj must be equal to the expected value 
of the transform component E(yj). 

We are left to show that when Oj (X) is a weighted K-L basis, then the 
weighted mean square error is minimized. We need to minimize 

WMSE=^E(y.-E(y.)) 2 =£ JJcD.(>.)W(X)K x (X,u)W(u)<I> j (u)dudX (2.14) 

i=m+1 i=m+lAA 


where K x (X,u) is the covariance function of the random process. 

Using the orthonormality constraint, we can write the mean square error 
as the quadratic functional [19] of Oj (X) 
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WMSE = X J J W(X) K x (X ’ u)W(u) (U) du d?l 

i = m + 1 a A 

' X M J - 1 } (2. 1 5) 

i*m+1 A 


Minimizing with respect to 3>j yields [19] 


( WMSE) = 2 J W (X.)K x (X,u)W(u)<I> j (u)du - 2X.W(X)0.(X) = 0 (2.1 6) 

' A 


The set { kj } thus turns out to be the eigenvalues of the covariance 
function of the observed X(^) , and the basis functions satisfy the weighted K-L 
equation 


J K x (X,u)W(u)0.(u)du = X. O.(X) i = 1,2, n (2.17) 

A 


From equations 2.14 and 2.17, we have 

WMSE = ^ Jo.(X)W(X) [ X. d>. (X) ] 6X ( 2 . 1 8 ) 

i « m+1 A 


or 

n 

WMSE = X. 

i»m+1 


(2.19) 
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If we rank the optimal functions according to the magnitudes of their 
associated eigenvalues from the largest to the smallest, then using the first few 
optimal functions in the series representation will results in the desired 
weighted minimum mean square error. Furthermore, the energy packing 
property will also be satisfied since the mean square error reduction for using 
each additional term in the expansion will be maximized. 


2.2 Uncorrelated Transformed Coefficients 


The generalized K-L transform results in uncorrelated coefficients. This 
property can be derived as follows. Since 

y = { y r y 2 . y n } (2.20) 

where 


y, - J (X) W(X) X(X) dX 

A 


( 2 . 21 ) 


and the covariance between yj and yj is defined as 


a. . = E(y.- E ( y. ) )( y. - E ( y. ) ) 


( 2 . 22 ) 


Using Eq.(2.21), Eq.(2.22) becomes 
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Therefore the transformed coefficients are uncorrelated. If the underlying 
distribution of the random process is Gaussian, the coefficients are then 
independent. 
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CHAPTER III 

SPECTRAL FEATURE DESIGN 


From the discussion in chapter 2, we know the weighted K-L transform 
preserves the minimum weighted mean square error (MWMSE) and ordered 
uncorrelated coefficients properties. In fact, the K-L transform is a special case 
of its generalized form with unity weight matrix. The fundamentals in remote 
sensing indicate [14,15] that the eigenfunctions derived in the K-L transform 
with unity weight matrix can not be used satisfactorily for feature design. The 
reason for this is basically the fact that the reflectance around the two water 
absorption bands has high variance and thus tends to dominate the formation of 
the basis functions. Therefore the spectral response in these two regions is not 
information-bearing. Indeed, the spectral radiance emanates mostly from the 
atmosphere and must be considered as noise. Understanding this important a 
priori knowledge about the scene, we can incorporate a weighting function into 
the calculation process to eliminate the effect of noise. The generalized K-L 
transform is then the solution. The resulting optimal functions can be used to 
transform the original observation space into a new feature space. 

In this chapter, four spectral feature design techniques will be presented 
first. Using simplicity and effectiveness as criteria, the most promising 
technique is then selected from these four schemes for our final feature design 
system. The four techniques developed in the course of this research are 
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1. Non-overlapping band feature selection algorithm, 

2. Walsh function approach, 

3. Infinite clipped optimal function approach, and 

4. Overlapping band feature selection algorithm. 

The non-overlapping and overlapping band feature selection algorithms 
are derived from the shape of the optimal features. The Walsh function 
approach and the infinite clipped optimal function approach are developed from 
the structure of the optimal features. 

After performing the generalized K-L transformation to the data [15], 
where a weight function is incorporated into the transform to avoid portions of 
the spectrum where the atmosphere is known to be opaque, the eigenfunctions 
can be found. These eigenfunctions serve as optimal features that linearly 
transform the original measurement space to the new space in a minimum 
mean square error sense [18]. However, because of the inherently complex 
nature of the optimal functions, an easy and fast implementation directly using 
them to process the tremendous amount of information collected must be found. 
Therefore, more realistic features are sought in order to achieve this 
requirement. More realistic features can be found by carefully studying the 
shapes of the first few eigenfunctions. The importance of a wavelength region 
for the purpose of accurately representing the ensemble of functions is 
indicated by the magnitude of the eigenfunctions in that region. It is 
hypothesized that the importance of a region in an ensemble-representational 
sense is positively correlated with (though not identical to) its importance with 
respect to classification accuracy. Referring to Figure 1.2, it is observed that 
each eigenfunction thus points to the more important regions. 
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For instance, the magnitude of the first eigenfunction indicates that there 
are 3 important regions over the entire spectrum: 0.4-1.28 pm, 1.48-1.78 pm 
and 1.98-2.4 pm, the magnitude of the second eigenfunction indicates that 
important regions are approximately 0.4-0.66 pm, 0.66-1.28 pm, 1.48-1.78 pm 
and 1 .98-2.4 pm, etc. From the fact that the magnitude of the first eigenfunction 
is very similar to the soil response function, and the magnitude of the second 
eigenfunction is similar to the vegetation curve, it is observed that the dominant 
portion of the ensemble, i.e. summer fallow , winter wheat and unknown crops 
for this data set K2, can be shown in the first few eigenfunctions derived from 
the weighted K-L transform. Therefore, it is desired to choose the regions with 
larger magnitude in the eigenfunctions, especially from those with largest 
eigenvalues, as sensor bands since these regions contribute most to reduction 
of representation error as well as increasing of classification performance. 

However, such a subjective process is difficult to carry out objectively due 
to the spectral detail in the eigenfunctions and the number of eigenfunctions to 
be examined. A machine implemented band selection algorithm based on this 
dominancy concept in the eigenfunctions is thus sought. 


3.1 Non-Overlapping Band Feature Selection Algorithm 

Infinite clipping is a procedure used to transform the signal into its signed 
form [21]. There is evidence in various circumstances that the axis crossings of 
a signal carry a substantial portion of the information that the signal carries. For 
example, in the field of speech recognition [28-33], the infinite clipping 
procedure can been used to extract zero crossing information and perform 
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speech recovery very successfully. For example, Ewing and Taylor [29] showed 
that zero-crossing information from a speech signal is a feasible way for 
computer speech recognition; and Niederjohn, et al [30] showed that the set of 
zero-crossings of a speech waveform represents a nearly minimal set of 
informational attributes in the sense that any reordering or averaging of the 
zero-crossing intervals has a detrimental effect upon speech intelligibility. 

Some other examples of using zero-crossing information of a signal can 
also be found in the fields of radar target detection [51-52], biomedical 
engineering [53], communications [54-55] and image processing [56]. Rainal 
[52] described a zero-crossing principle for detecting weak narrow-band signals 
immersed in Gaussian noise. An application of the zero-crossing principle to 
the detection problem of a stationary radar target in clutter was discussed. 
Masuda, et al [53] demonstrated in a biomedical context that the muscle fiber 
conduction velocity, which is known to be an index of the degree of muscle 
fatigue or muscle disease, can be accurately measured by using zero-crossing 
information from a surface electromyogram signal. In conventional 
communications, Voelcker [54] showed that an angle-modulated signal can be 
demodulated given only its zero-crossings; Wiley, et al [55] proposed an 
iterative demodulation procedure for very wide-band FM by use of a zero- 
crossing discriminator. Haralick [56] showed that the zero-crossing of second 
directional derivatives within the pixel's area can be used to detect the step 
edges in the image. 

Thus, one possible approach to finding the desired procedure would be 
to apply infinite clipping to extract the zero crossing information. The input to 
this algorithm will be the average of the first few eigenfunctions. The output of 
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this algorithm is to be the band edges showing how the bands should be 
chosen. We will refer to this procedure as the non-overlapping (N.O.L.) band 
feature selection algorithm. Figure 3.1 shows the average of the first 12 
eigenfunctions. After thresholding, the data of Figure 3.1 become as in Figure 
3.2 where +1 represents the positive portions of Figure 3.1, -1 represents the 
negative portions, and 0 represents the water absorption bands centered at 1 .4 
and 1 .9 pm respectively. It should be noted that there is no response over the 
above water absorption bands due to the use of the weight function in the K-L 
transform, which has been set 1.0 over the entire spectrum and a very small 
positive value in the water bands. 

The band edges are found as follows: whenever a transition in sign or 
magnitude occurs in Figure 3.2, the wavelength of the associated band is 
recorded. Table 3.1 shows the results after transition operation. The band 
edges in Table 3.1 can be used to set up the suboptimal basis functions for data 
compression [ refer to the 2nd column in Table 3.6 ]. 


Table 3.1. Band Edges Obtained by Infinite Clipping of the Average 
of the First 12 Eigenvectors for Data Set K2 


Band 



0.40 - 0.68 


0.68 - 0.90 


0.90 - 0.92 


0.92 - 0.94 


0.94 - 1.00 


1.00 - 1.06 


1.06 - 1.12 


1.12 - 1.26 


1.26 - 1.28 

10 

1.48 - 1.78 

1 1 

1.98 - 2.40 
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Figure 3.1 



Average of the First 12 Eigenvectors of Data Set K2 



Figure 3.2 Thresholded Version of Figure 3.1 
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3.2 Walsh Function Approach 

By carefully viewing the structure of the eigenfunctions in Figures 1.2, 
one may also observed that the eigenfunctions corresponding to the larger 
eigenvalues tend to have coarser structure than those with smaller eigenvalues. 
A similar effect exists in the Walsh functions indexed by the number of zero- 
crossings. The higher the index of the Walsh function, the finer the structure of 
the function [19,20]. The first 10 Walsh functions indexed by the number of axis 
crossings are shown in Figure 3.3, where curve 0 is the first Walsh function with 
no axis crossing, curve 1 is the second Walsh function with one axis crossing, 
etc. 


The inner product of the two functions may be thought of as a 
mathematical measure of similarity of the two functions. The absolute values of 
the inner products of the first 16 eigenfunctions with the first 64 Walsh functions 
are calculated. Table 3.2 shows part of the results. Absolute values of the inner 
product are used since the polarity is not significant here. Table 3.3 shows the 
similarity relation between these two sets of functions. For example, the number 
"1" in the (1,1) matrix position indicates that the first eigenfunction is more 
similar to the first Walsh function than to any other 63 Walsh functions since the 
value 0.84 in Table 3.2 is the largest in the " first " column. The numbers "2", "3" 
and "4" in the (1,2), (1,3) and (1,4) matrix positions indicate that the 2nd, 3rd 
and 4th eigenfunctions mostly look like the 2nd, 3rd and 4th Walsh functions 
respectively in the sense of signal structure similarity. Therefore, the structure 
of the first 4 eigenfunctions can be approximated by that of the first 4 Walsh 
functions. By observing the first two rows of Table 3.3, it can be concluded that 
the first 16 eigenfunctions and the first 16 Walsh functions have approximately 
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the same structure. The structure in the eigenfunctions is related to the axis 
crossings in the signals. The coarser the structure, the less the number of axis 
crossings; and vice versa. These axis crossings are hypothesized to contain 
important information that can be used for classification. Therefore, it is 
feasible to use the first few Walsh functions as spectral features in high 
dimensional multispectral data. 
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Table 3.2 Absolute Values of Inner Products Between 
Optimal Functions and Walsh Functions 





31 


Table 3.3 Similarity Relation Between Optimal 
Functions and Walsh Functions 
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3.3 Infinite Clipped Optimal Function Approach 

If one studies the Walsh functions more carefully , it is found that although 
the Walsh functions approximate the optimal functions in the sense of structure 
similarity, they do distort some of the spectral spacing information in the optimal 
functions. The axis crossing separation in the optimal functions is a relatively 
irregular pattern, while it is quite regular in the Walsh functions. 

One way that can be applied to avoid this information loss is to use the 
infinite clipped optimal functions as spectral features. The infinite clipped 
optimal function approach preserves the zero-crossing information in the 
optimal functions which is hypothesized to contain important spectral 
information that can be used for classification. 

Furthermore, the Walsh function approach is less flexible than the infinite 
clipped optimal function approach since the spectral features using the Walsh 
functions tend to be fixed for all analysis situations; while, on the other hand, the 
infinite clipped optimal function approach does give some degree of 
adaptability. Figure 3.4 shows the infinite clipping versions of the first 6 
eigenfunctions for data set K2. 

The infinite clipped optimal functions, derived from the signs of the 
optimal functions, are then used as spectral features (i.e., basis functions) to 
linearly transform the high dimensional multispectral data to the ground station 
for further processing. 
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3.4 Overlapping Band Feature Selection Algorithm 

The overlapping band feature selection algorithm originates from the 
inherent overlapping property of the optimal functions. This property suggests 
that overlapping bands might be even more powerful for spectral feature 
design. The idea of this algorithm is to find the locations of the important 
spectral bands without imposing the additional restriction that the bands be 
non-overlapping. The basic procedures used are very similar to those in the 
non-overlapping band feature selection algorithm. In the non-overlapping band 
feature selection algorithm, the infinite clipping procedure is applied to the 
average of the first few eigenfunctions in order to extract the information of the 
important spectral bands; while in this overlapping case, the infinite clipping 
procedure is applied to each individual eigenfunction. 

The first step is to find the band edges of each individual eigenfunction. 
Table 3.4 shows part of the results for data set K2. In Table 3.4, comparing to 
Figure 1 .2, it is found that there are 3 important bands for the first eigenfunction, 
4 for the 2nd one, 8 for the 3rd one, etc. 

It should be noted that the band features derived in this way are not all 
linearly independent. For example, the first and second band feature from the 
second eigenfunction, that is, 0.40-0.66 pm and 0.66-1.28 pm, are linearly 
dependent on the first band feature from the first eigenfunction ( 0.40-1.28 pm ). 
Another example is the identical band features ( 1.48-1.78 and 1.98-2.40 pm ) 
derived from the first 5 eigenfunctions. Indeed, these repeated bands and the 
bands which are linearly dependent on the previously selected bands can not 
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be used as spectral features since linearly dependent features will result in 
singular class covariance matrix. 


Table 3.4 Linearly Dependent Bands Found by Overlapping 
Band Feature Selection Algorithm for Data Set K2 



1 

2 

3 

BAND 




1 

0.40 - 1.28 

0.40 - 0.66 

0.40 - 0.94 

2 

1 .48 - 1 .78 

0.66-1.28 

0.94-1.00 

3 

1.98-2.40 

1.48-1.78 

1.00 - 1.02 

4 


1.98-2.40 

1.02 - 1.12 

5 



1.12-1.16 

6 



1.16 - 1.28 

7 



1 .48 - 1 .78 

8 



1.98-2.40 


------ 





Eigenvector# 

4 

5 

6 

BAND 




1 

0.40 - 0.92 

0.40 - 0.70 

0.40 - 0.44 

2 

0.92 - 1.26 

0.70 - 0.92 

0.44 - 0.50 

3 

1.26 - 1.28 

0.92 - 0.96 

0.50 - 0.52 

4 

1.48-1.78 

0.96-1.06 

0.52 - 0.66 

5 

1.98-2.40 

1.06- 1.28 

0.66 - 0.84 

6 


1 .48 - 1 .78 

0.84 - 0.92 

7 


1.98-2.40 

0.92 - 0.94 

8 



0.94-1.00 

9 



1.00-1.04 

10 



1.04-1.12 

1 1 



1.12-1.28 

12 



1.48- 1.64 

13 



1.64-1.78 

14 



1.98-2.20 

15 



2.20 - 2.40 


An algorithm is developed to automatically choose the linearly 
independent bands from the first 6 eigenfunctions. Table 3.5 shows the result. 
Basically, this algorithm checks the rank of the matrix consisting of the bands 
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derived in Table 3.4. First, the linearly dependent bands in Table 3.4 are 
ranked from the widest to the narrowest. Then, starting from the widest band, 
this algorithm checks the matrix rank. If the rank is less than the total number of 
the band features, the band features in the matrix are linearly dependent, the 
widest linearly dependent band in the matrix is then eliminated from the set. On 
the other hand, if the rank is equal to the total number of the band features, 
increase the matrix rank by one and test the next widest band. 

The procedure used in the above overlapping band feature selection 
algorithm can find the largest set of smallest bands that are linearly 
independent . This procedure can be summarized as follows . 

(1) Find the band edges of each individual eigenfunction 

(2) Rank these linearly dependent bands from the widest to the 
narrowest, then set rank n = 1 

(3) Starting from the widest band, check the rank of the feature matrix 

(4) If the rank is less than the total number of the bands, eliminate the 
widest linearly dependent band in the matrix, then go to step (3) to 
test the next widest band; 

(5) If the rank is equal to the total number of the bands, increase n 
by 1 , then go to step (3) to test the next widest band 

(6) Set up the final feature set 
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Table 3.5 Linearly Independent Bands Found by Overlapping 
Band Feature Selection Algorithm for Data Set K2 


Band 

wavelength (pm) 

| 1 

0.70 - 0.92 

2 

1.98 - 2.20 

3 

2.20 - 2.40 

4 

0.66 - 0.84 

5 

1.48 - 1.64 

6 

0.52 - 0.66 

7 

1.64 - 1.78 

8 

1.16 - 1.28 

9 

0.96 - 1.06 

10 

1.04 - 1.12 

11 

0.94 - 1.00 

12 

0.44 - 0.50 

13 

1.12 - 1.16 

14 

0.92 - 0.96 

15 

0.40 - 0.44 

16 

1.00 - 1.04 

17 

1.00 - 1.02 

18 

1.26 - 1.28 

19 

0.50 - 0.52 

20 

0.92 - 0.94 


3.5 Experimental System 

In order to process the data in a digital computer, the spectral reflectance 
function X(X), the weight function W (X), the optimal basis function Oj (X) and 
the sequence of the optimal basis functions O(X) are represented by their 
discrete approximations, vector X, diagonal matrix W, basis vector Oj and the 
matrix O respectively. 

An experimental software system has been set up to test the four 
approaches developed in the previous sections. This system has been 
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implemented on IBM 3083 computer. A collection of field data consisting of 
spectral sample functions on three dates from Williams County, ND, and three 
dates from Finney County, KS, was available from the field measurement library 
at Purdue/LARS. The spectral functions were sampled at 0.02 pm over the 
range 0.4 to 2.4 pm, therefore, the dimensionality is 100. 

The optimal features are found numerically by estimating the covariance 
matrix from the sample functions. Maximum likelihood estimates of the mean 
and covariance matrix are given [34] by 

= E(X) - X =-i-X X. (3.1) 

1 s i-1 

and 

N , 

k, « TrrrXfXi-xxXi-x) 1 (3.2) 

IN s 1 i-1 

where N s is the number of the sample functions and Xj is the i^ sample vector. 
The covariance matrix is then used to solve the discrete form of the generalized 
Karhunen Loeve Equation [14,15] : 

K x W 0> = O T (3.3) 

where the <I>, r and W are the eigenvectors, eigenvalues and the weight 
matrix, respectively. The solutions of the equation are the optimal features. 

In order to find appropriate non-overlapping bands used in feature 
design, the non-overlapping band feature selection algorithm is applied to the 
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average of the first few eigenvectors. Three cases were studied, tests using the 
first 6, 12 or 24 eigenvectors in the algorithm. For the illustrative example shown 
in section 3.1, the second case is considered. 

For overlapping band features, the infinite clipping procedure is applied 
to each individual eigenfunction. In this preliminary test the first 6 
eigenfunctions from each of the 6 data sets are used. The locations of the 
important spectral bands are then extracted. After applying the overlapping 
band feature selection algorithm to the spectral bands derived above, the 
desired linearly independent (L.l.) band features are found. 

The bands found by the above two algorithms, the Walsh functions or the 
infinite clipped optimal features developed from the structure similarity property 
are then used as spectral features to perform the linear transformation on the 
data sets. 

y. = O^WX (3.4) 

In order to test the spectral features thus determined, the probability of 
correct classification is estimated using them. To do so, the class-conditional 
statistics are first computed using the transformed data. An algorithm based on 
the maximum likelihood estimator [34] is then applied, where the class 
conditional statistics are assumed to be multivariate Gaussian. 
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3.6 Preliminary Results 

After applying the N.O.L. band feature selection algorithm to the average 
of the first 6, 12 or 24 eigenvectors of the six test data sets, the band edges are 
found. Table 3.6 shows the results for the data set K2 for three different number 
of eigenvectors. These three feature sets are named as proposed sensor Cl, 
C2 and C3 respectively. For brevity, they are denoted PCI, PC2 and PC3. On 
the other hand, the O.L. band feature selection algorithm is applied to the first 6 
eigenfunctions, the result of the first 16 linearly independent bands is shown in 
Table 3.7 for data set K2. 

Furthermore, the probabilities of correct classification using Landsat (LS) 
MSS bands, Thematic Mapper (TM) bands and the two sensors proposed in 
Wiersma's work (PA and PB) [14,15] are also computed here. Table 3.8 shows 
the band edges associated with each sensor [15]. 


Table 3.6 Bands Found by Non-Overlapping Band 

Feature Selection Algorithm for Data Set K2 
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Table 3.7 Bands Found by Overlapping Band Feature 
Selection Algorithm for Data Set K2 


Band 


1 1 

0.70 - 0.92 

! 2 1 

1.98 - 2.20 

1 3 

2.20 - 2.40 

4 

0.66 - 0.84 

5 

1.48 - 1.64 

6 

0.52 - 0.66 

7 

1.64 - 1.78 

8 

1.16 - 1.28 

9 

0.96 - 1.06 

10 

1.04 - 1.12 

11 

0.94 - 1.00 

12 

0.44 - 0.50 

13 

1.12 - 1.16 

14 

0.92 - 0.96 

15 

0.40 - 0.44 

16 

1.00 - 1.04 


Figures 3.5 to 3.10 are the classification performance comparisons of the 
optimal functions (Optimal), Walsh functions (Walsh) and the infinite clipped 
optimal functions (Clipped) for the 6 data sets. Figure 3.11 to 16 are the 
comparisons of the LS, TM, Wiersma's proposed sensor PA, non-overlapping 
band features (NOL) derived from the first 24 eigenfunctions (i.e., PC3), 
overlapping band features (OL), Walsh functions, infinite clipped optimal 
functions and optimal functions for the 6 preliminary test data sets. From the 
implementation point of view, since there are only two values (+1, -1) for the 
Walsh functions and three values (+1, -1, 0 ) for the infinite clipped optimal 
functions, it can be concluded from Figures 3.5 to 3.16 that representing the 
optimal features using their infinite clipping versions or using the first 16 Walsh 
functions produces the more practical features used for classification which 
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provide a classification accuracy quite near that of optimal features. The 
classification performances estimated for the above sensors are shown in Table 
3.9, where PCI , PC2 and PC3 represent the sensors derived from N.O.L. 
band feature selection algorithm using the first 6, 12 and 24 eigenvectors as 
their input respectively; Optimal, Walsh and Clipped stand for the sensors using 
the first 16 optimal functions, the first 16 Walsh functions and the first 16 infinite 
dipped optimal functions as spectral features respectively. 


Table 3.8 Band Edges of Landsat MSS, TM, PA and PB Sensors 


Band 

LS 

TM 

PA 

PB 

1 

0.50-0.60 

0.45-0.52 

0.42-0.54 

0.42-0.66 

2 

0.60-0.70 

0. 52-. 060 

0.56-0.66 

0.68-0.70 

3 

0.70-0.80 

0.63-0.69 

0.68-0.70 

0.72-0.92 

4 

0.80-1.10 

0.76-0.90 

0.72-0.90 

0.94-1.04 

5 


1.55-1.75 

0.92-1.00 

1,06-1.10 | 

6 


2.08-2.35 

1.02-1.30 

1.12-1.30 

7 



1.52-1.74 

1.52-1.74 

8 



1.96-2.40 

1.96-2.40 


Table 3.9 Probability of Correct Classification for 6 Data Sets 


SENSOR 

K1 

K2 

K3 




LS 

0.90 

0.78 

0.85 

0.77 

0.83 

0.96 

TM 

0.92 

0.79 

0.93 

0.89 

0.95 

0.99 

PA 

0.94 

0.86 

0.95 

0.92 

0.96 

0.99 

PB 

0.94 

0.85 

0.94 

0.89 

0.96 

0.96 

PCI 

0.94 

0.87 

0.96 

0.92 

0.97 

0.99 

PC2 

0.96 

0.88 

0.97 

0.94 

0.97 

0.99 

PC3 (NOL) 

0.96 

0.94 

0.98 

0.96 

0.98 

0.99 

OL 

0.97 

0.94 

0.98 

0.97 

0.99 

0.99 

Walsh 

0.98 

0.95 

0.98 

0.95 

0.98 

0.99 

Clipped 

0.98 

0.97 

0.99 

0.97 

0.99 

0.99 

Optimal 

0.98 

0.97 

0.98 

0.97 

0.99 

0.99 
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Pc 


Figure 3.5 


Kansas September Data 



Number of Features 


Performance Comparison of Optimal, Infinite Clipped 
Optimal and Walsh Functions for Data Set K1 
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Pc 


Figure 3.6 


Kansas May Data 



Number of Features 


Performance Comparison of Optimal, Infinite Clipped 
Optimal and Walsh Functions for Data Set K2 
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Pc 


Figure 3.7 


Kansas June Data 



Number of Features 


Performance Comparison of Optimal, Infinite Clipped 
Optimal and Walsh Functions for Data Set K3 
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Figure 3.8 


North Dakota May Data 



Performance Comparison of Optimal, Infinite Clipped 
Optimal and Walsh Functions for Data Set N1 
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Pc 


Figure 3.9 


North Dakota June Data 



Performance Comparison of Optimal, Infinite Clipped 
Optimal and Walsh Functions for Data Set N2 
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Figure 3.10 Performance Comparison of Optimal, Infinite Clipped 
Optimal and Walsh Functions for Data Set N3 


North Dakota August Data 
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North Dakota June Data 



LS TM PA NCL CL Walsh Clipped Optimal 

Sensor 


Figure 3.15 Performance Comparison for Data Set N2 
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Figure 3.16 Performan 
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3.7 Selection of the Best On-Board Preprocessing Scheme 

From Table 3.9 and Figures 3.5 to 3.16, it is seen that the four 
approaches developed in this research, two based on the " shape M of the 
optimal features and the other two from their "structure" similarity with the 
optimal functions, are feasible ways for feature design. 

The fundamental objective of this research is to develop an objective and 
practical spectral feature design technique for high dimensional multispectral 
data. There are two important factors, simplicity and effectiveness, which must 
be considered in this respect. 

First of all, from simplicity point of view, the overlapping band feature 
selection algorithm is harder to perform than the other three because of the 
existence of linear dependence problem. In order to find appropriate 
overlapping band features, we have to check the rank of the matrix for each 
newly selected band. This procedure needs more time than the other three 
approaches. However, its classification performance [ referring to Figure 3.11 to 
3.16 ] does not indicate much advantage over the other three, especially the 
infinite clipped optimal function approach. 

For example, Figure 3.11 and 3.12 show that for Kansas September 
and Kansas May data the performances of the overlapping band feature 
selection algorithm are the 3rd best among the four techniques. The infinite 
clipped optimal function approach and the Walsh function approach have better 
performances than that of the overlapping band feature selection algorithm. 
Figure 3.13 to 3.16 indicate that the performances of the overlapping band 
feature selection algorithm are never better than those of the infinite clipped 
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optimal function approach. Therefore, from simplicity point of view, the 
overlapping band feature selection algorithm would not be used in this thesis as 
the best technique for the final data preprocessing system. 

On the other hand, from effectiveness point of view, referring to Table 
3.9 and Figure 3.5 to 3.16 again, it is shown that the infinite clipped optimal 
transform has better performance than the Walsh transform and the non- 
overlapping band feature selection algorithm. 

For instance, Figure 3.5 to 3.10 indicate that the infinite clipped optimal 
features have better classification accuracy than the Walsh features for all the 
six preliminary test data sets in Kansas and North Dakota. Figure 3.11 to 3.16 
show that the infinite clipped optimal features perform better than the non- 
overlapping band features for all the 6 test data sets except for North Dakota 
August data (Figure 3.16) where these two techniques have the same 
performance. 

Therefore, from simplicity and effectiveness point of view, the infinite 
clipped optimal transform is chosen to be the best scheme in the data 
preprocessing stage of the spectral feature design system. 

The processing up to this point, consisting of the optimal features 
calculation, the infinite clipping, and the data transform is based solely upon the 
ensemble statistics of the field data. Additional a priori knowledge that might 
be used to improve the performance is the class statistics of the scene. The 
objective is then to find the best features under the criterion of maximal class 
separability. 



3.8 Canonical Analysis and Ground Station Data Processing. 


Canonical Analysis is a technique that can be used to find the optimal 
features under a maximal separability criterion [36-41], Unlike principal 
component analysis, which is based on the global covariance matrix of the full 
data set, canonical analysis utilizes the class structure of the data. The 
advantage of canonical analysis is its ordering property on the separability 
measure. By using the features derived from canonical analysis to further 
process the received transformed data, the classification performance should, 
therefore, be improved. 

Let Mj and Sj be the i ,h class mean vector and covariance matrix of a 
data set with L classes. In canonical analysis one first finds the within-class 
scatter and the among-class scatter matrices S w and S a respectively : 

L (N -1) 

S w = I-fj - * s i (3-5) 

i=l s 

where Nj is the number of samples of the i th class data and N s is the total 
number of samples of the ensemble. And, 

s a - f £ (M, - M 0 )(M. ■ m/ (3.6) 

i=1 

where M 0 is the global mean, given by 
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The within class scatter matrix, S w , is an average quantity that describes 
how closely the samples are distributed around their glass means while the 
among class scatter matrix, S a , is a quantity measuring the average degree of 
closeness between the ensemble mean and each class mean. The optimally 
separable feature is a feature such that S w is minimized and S a is maximized 
after the transformation. Define a quantity r and let the desired feature be 
vector d. Then the objective is to find the r and d that result in maximal 
class separability. That is, 

dVd 

r = (3.8) 

d S d 

w 

must be maximized. The ratio of variances in the new space is maximized by 
the selection of feature d if, 


= o 

5d (3.9) 

The above equation can be reduced to 

(S a - r * S w ) * d = 0 (3.10) 

which is called a generalized eigenvalue equation and must be solved now for 
the unknown r and d. The first canonical axis will be in the direction of d, and r 
will give the associated ratio of among-class to within-class variance for that 


axis. 
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The development to this stage is usually referred to as discriminant 
analysis. One more step is included in the case of canonical analysis where 
the derived canonical features are normalized with respect to the within class 
scatter matrix. That is, 

DT*S w *D = I (3.11) 

where D is the matrix of canonical features d. This says that the within class 
scatter matrix after the transformation must be the identity matrix. In other 
words, after transformation, the classes should appear spherical. 
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CHAPTER IV 

RESULTS AND DISCUSSIONS 

In the previous chapter, we have introduced the four spectral feature 
design techniques developed in the course of this research. Six preliminary 
test data sets in Kansas and North Dakota were used to test the schemes. From 
a simplicity and effectiveness point of view, the infinite clipped optimal 
transform is chosen as the better means for data preprocessing. Furthermore, 
canonical analysis is applied to the above received transformed data on the 
ground station to achieve the maximal class separability. In this chapter, both 
the vegetation and the soil data will be used to find the classification 
performance for the final spectral feature design system. The spectral range for 
the vegetation data is from 0.4 pm to 2.4 pm with resolution 0.02 pm while the 
range for the soil data is from 0.45 pm to 2.45 pm with resolution 0.01 pm. 
Therefore the dimensionality for the vegetation data and the soil data is 100 and 
200 respectively. The final results of these data will be presented in section 
4.1 and 4.2. Moreover, due to the limited sample size of the data set to 
estimate the covariance matrix, different degree of Hughes phenomenon 
occurs in some of the one-day Kansas and North Dakota vegetation data sets 
as well as in all soil data sets. This effect will be discussed in section 4.3. 
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4.1 Vegetation Data 

Four sets of multitemporal multispectral data collected in Kansas, North 
Dakota, Iowa and South Dakota are acquired to test the proposed spectral 
feature design system. Table 4,1 show the species, the dates on which the 
data were collected, and the total numbers of sample functions for each 
information class. In Table 4.1, the numbers appearing in the parentheses are 
the total numbers of sample functions collected for that class. Furthermore, 
W.Wheat and S.Wheat stand for winter wheat and spring wheat respectively. 

Figure 4.1 to 4.6 show the probability of correct classification, Pc, using 
the optimal features, infinite clipped optimal features and features that are 
derived from infinite clipped optimal transform and canonical analysis for the six 
preliminary test data sets. These 6 data sets are part of the multitemporal data 
in Kansas and North Dakota ( referring to Table 1.1 and Table 4.1 ). Each one 
of them consists of the sample functions collected on one single date and has 3 
informational classes. The results indicate that using the first 16 infinite clipped 
versions of the optimal functions, 95% classification accuracy can be achieved. 

Another important point is the occurrence of Hughes phenomenon 
[42,43] shown in Figure 4.1 to 4.4. It says that for data set K1, K2, K3 and N1, 
increasing the computational complexity [11] does not always increase the 
classification performance. For example, Figure 4.1 shows that canonical 
analysis improves the accuracy for the first 3 features, but it does not help 
beyond this complexity for data set K1 . Figure 4.2 to 4.4 show that canonical 
analysis can only have better performance for the first 4 features for data sets 
K2, K3 and N1 respectively. 
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For data set N2 and N3, it is found in Figure 4.5 and 4.6 that Hughes 
phenomenon does not occur, and the classification performance using the 
features derived from infinite clipped optimal transform and canonical analysis 
is always better than those of the optimal features and the infinite clipped 
optimal features. It is also shown that only 2 features are needed to have about 
94% and 99% classification accuracy for these 2 data sets respectively. 

Figure 4.7 and 4.8 show the results for Kansas and North Dakota multi- 
temporal data. Each one has 9 information classes collected on 3 different 
dates from 1976 to 1977. The results indicate that canonical analysis improves 
the accuracy by about 15% to 25% for the first feature and about 1% for the first 
16 features. Figure 4.9 is the results of Kansas and North Dakota combined 
data with 18 information classes. It is used to show the robustness property of 
this spectral feature design system. The results show that the technique is not 
overly sensitive for spatially and temporally combined data. 

Figure 4.10 and 4.11 are the results for 25-class Iowa and 42-class 
South Dakota multi-temporal data. They are used to show the capability of this 
spectral feature design system for complex data sets. It can be seen that the 
system is very successful in this respect. 
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Table 4.1 : Vegetation Data Sets. 

Numbers In the parenthesis are the total numbers of samples. 


Kansas Vegetation Data Set : 9 classes 


9/28/76 

5/3/77 

6/26/77 

W. Wheat ( 141 ) 

W. Wheat ( 658 ) 

W.Wheat (677) 

Summer Fallow (414) 

Summer Fallow ( 211 ) 

Summer Fallow ( 643 ) 

Sorghum ( 277 ) 

Unknown Class ( 682 ) 

■ Ml— 


North Dakota Vegetation Data Set : 9 classes 


I 5/8/77 

6/29/77 

8/4/77 


S. Wheat ( 787 ) 

S. Wheat ( 931 ) 

Summer Fallow (437 ) 

Summer Fallow ( 291 ) 

Summer Fallow ( 330 ) 

Pasture ( 164 ) 


Pasture/ 183) 



South Dakota Vegetation Data Set : 42 classes collected on 6 different dates of 1978 and 1979 
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i.o- 

0.9- 


0.8 H 


0.7 H 


0.6 H 


Kansas May Data 
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North Dakota June Data 
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North Dakota Vegetation Data 



Number of Features 


Figure 4.8 Classification Performance for N. Dakota 
Multitemporal Data Set 



2 


Kansas and North Dakota Combined Data 



0 1 2 3 4 5 6 7 8 9 10111213141516 


Number of Features 


Figure 4.9 Classification Performance for 
KS/ND Combined Data Set 
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South Dakota Vegetation Data 



Number of Features 


Figure 4.11 Classification Performance for S. Dakota 
Multitemporal Data Set 
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4.2 Soil Data 

In addition to the above FSS vegetation data, a soil data base with 571 
soil samples collected by Eric Stoner [45] in 1979 was acquired to test the 
system. The soil reflectance functions were measured by an EXOTECH-C 
spectrometer in the laboratory. In this research, five data sets grouped by soil 
order, organic matter content #1, organic matter content #2, Iron oxide content 
and soil texture [46-50] were formed respectively to test the spectral feature 
design system. They are designated as data sets SO, OM1, OM2, 10 and ST 
respectively. It should be noted that the same soil samples are used in the data 
sets, but they are only grouped differently into classes. The soil data set 
designated as organic matter content #1 is from the soil orders Mollisol and 
Alfisol [48] only, while the soil data set designated as organic matter content #2 
is from all soil orders. These 5 soil data sets are shown in Table 4.2 

Table 4.2(a) shows the 10 soil orders in American Soil Taxonomy [48]. 
Since the total numbers of sample functions for Spodosol, Vertisol, Histosol and 
Oxisol are very limited, in this research, these soils are not used to form the 
data set SO. Only the data in the first 6 soil orders are included in SO. Table 
4.2(b), (c) and (d) indicate the ranges of organic matter content #1, organic 
matter content #2 and iron oxide content respectively. Six classes are chosen 
in these 3 data sets: 0M1 , 0M2 and 10. Table 4.2(e) shows the 6 soil texture 
classes used in data set ST where some of the classes consist of more than one 
soil texture group. For example, class 1 in data set ST includes clay and silty 
clay; class 2 includes sandy clay loam, clay loam and silty clay loam; etc. 

The results of these 5 soil data sets are shown in Figure 4.12 to 4.16. 
Taking a general view of these graphs, it is found that the cumulative 
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performances of these soil data sets are less like a standard error function 
compared to those found in vegetation data sets (referring to Figure 4.1 to 
4.1 1). The reason for this is that the total numbers of sample functions used to 
estimate the covariance matrices in the soil data sets are very limited, from a 
little more than the dimensionality in data set OM1, that is, 255 sample functions 
with dimensionality 200, to about 2.5 times the dimensionality in SO, OM2, 10 
and ST, that is about 500 sample functions for each data set; while on the other 
hand at least 8 times the dimensionality are available in the vegetation data 
sets. For example, the smallest data set K1 has 832 sample functions with 
dimensionality 100 and data sets other than K1 have more than 1000 sample 
functions to estimate the covariance matrix. Therefore, the estimates of the 
covariance matrices for the vegetation data sets are likely to be much more 
accurate than those for the soil data sets. The subsequent Gaussian model thus 
becomes more valid for the vegetation data and the cumulative classification 
curves are more like a standard error function. 

Furthermore, Figure 4.12 to 4.16 show that the infinite clipped optimal 
functions are very effective to extract the information for soil classification. For 
instance, Figure 4.12 to 4.13 indicate that using the first 16 infinite clipped 
optimal functions, over 90% accuracy can be achieved while Figure 4.14 to 
4.16 tell that over 85% accuracy is obtained. Due to the limited sample size for 
each of the soil data sets, different degrees of the Hughes phenomenon occur. 
Figure 4.12 to 4.14 show that canonical analysis improves the performance for 
the first 5 features while Figure 4.15 to 4.16 show that improvement is possible 
up to the first 7 features. 
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Table 4.2 Soil Data Sets : 



(a) SO by Soil Order 
Sample size for the first 6 classes : 479 


class # Order Name I # of Sample Functions 


Mollisol 


Alfisol 


Entisol 


Aridisol 


Ultisol 



Vertisol 


Histosol 


Oxisol 


Unclassified 



(b) OM1 by Organic #1 

Soil from Mollisol and Alfisol only. Sample size : 255 

Class # I Organic Matter Range % I # of Sample Functions 


1 0.11 ~ 1.5 51 


2 1.5 ~ 2.0 54 


3 2.0 - 2.5 33 


2.5 ~ 3.5 45 


5 3.5 ~ 5.0 39 


6 5.0 ~ 10.12 33 


(c) OM2 by Organic #2 


Soil from all orders. Sample size : 514 


Class # 

Organic Matter Range % 

# of Sample Functions 

1 

0.08 ~ 1.0 

82 

2 

1.0 ~ 2.0 

135 

3 

2.0 - 3.0 

120 

4 

3.0 ~ 4.0 

54 

5 

4.0 ~ 6.0 

59 

6 

6.0 ~ 84.79 

64 
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Table 4.2, continued 


(d) 10 by Iron Oxide Content 
Sample size : 467 



(e) ST by Soil Texture 

Total sample size : 483 excluding the unclassified 


Class # 

Soil Texture Group/Groups 



1 


19 




21 

40 

2 

Sandy Clay Loam 

6 



Clay Loam 

25 



Silty Clay Loam 

32 

63 

3 

Coarse Sand 

3 



Large Coarse Sand 

6 



Sand 

13 



Large Sand 

16 



Large Fine Sand 

18 



Fine Sand 

20 

76 

4 

Coarse Sandy Loam 

5 



Very Fine Sandy Loam 

12 



Sandy Loam 

24 



Fine Sandy Loam 

52 

93 

5 

Loam 

68 

68 

6 

Silt 

4 



Silt Loam 

139 

143 

7 

Unclassified 

r 88 

88 
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Soil Data Set Grouped by Soil Order 



Number of Features 


Figure 4.12 Classification Performance for Soil 
Data Grouped by Soil Order 
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Soil Data Set Grouped by Organic Matter#1 



Number of Features 


Figure 4.13 Classification Performance for Soil 
Data Grouped by Organic #1 
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Soil Data Set Grouped by Organic Matter#2 



Number of Features 


Figure 4.14 Classification Performance for Soil 
Data Grouped by Organic #2 
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Soil Data Set Grouped by Iron Oxide Content 



Number of Features 


Figure 4.15 Classification Performance for Soil 
Data Grouped by Iron Oxide 
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4.3 Hughes Phenomenon 

In 1968, Hughes [42] showed theoretically that the mean recognition 
accuracy for the statistical pattern classifiers did not always increase as the 
measurement complexity increased so long as the number of training samples 
was fixed and finite. This result was experimentally demonstrated in a remote 
sensing context by Fu, Landgrebe and Phillips [43] in 1969. The conclusion of 
these investigations was that for a fixed number of training samples, there is an 
optimal measurement complexity. More complexity is undesirable from the 
standpoint of expected classification accuracy. 

Kalayeh, Muasher and Landgrebe [51,52] developed a criterion to 
predict the occurrence of the Hughes phenomenon. They suggested that a 
number of sample functions equal to about 8 to 10 times the dimensionality 
must be available for the ensemble in order to avoid the Hughes phenomenon. 

In this section, four experiments are described to show that the Hughes 
phenomenon did occur in the data sets with limited training samples. The data 
sets K1 and N2 were chosen for this purpose because K1 has the least training 
samples ( referring to Table 1.1 ) among all vegetation data sets and N2 
( referring to Figure 4.5 ) indicated some possibility for the occurrence of the 
Hughes phenomenon. Tables 4.3(a) to (d) show the data used for these 4 
experiments and Figures 4.17 to 4.20 show the results. In the above tables and 
figures, K1H and N2H are the data sets with about one half of the original 
training samples while K1Q and N2Q represent those with approximately one 
quarter of the training samples. 
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Figure 4.17 and 4.18 show that for data set K1, the Hughes phenomenon 
has occurred ( referring to Figure 4.1 ). If the size of the training samples is 
reduced to half or even to quarter, the effect of this phenomenon becomes 
more and more serious. On the other hand, for data set N2, there is no 
Hughes phenomenon ( referring to Figure 4.5 ). If the size of the training 
samples becomes one half of the original N2, the Hughes phenomenon might 
or might not occur. Figure 4.19 indicates that for data set N2, reducing the size 
of the training samples to approximately one half, that is 630 samples with 
dimensionality 100, the estimate of covariance matrix is still accurate enough, 
and the Hughes phenomenon does not occur. 

However, if the training size of the data set N2 is reduced to one quarter, 
the Hughes phenomenon does occur. Figure 4.20 says that the optimal 
number of features in this data set N2Q with 315 training samples is only 2. 
The maximal classification accuracy that can be achieved is about 85%. 
Furthermore, more than 2 features used for classification would not help the 
performance and in some cases even reduce the accuracy. 

The four experiments in this section indicate that for data set K1, more 
than 832 samples are needed in order to avoid the effect of Hughes 
phenomenon; on the other hand, for data set N2, 1239 samples are enough to 
accurately estimate the covariance matrix. From the classification performances 
of data sets K1 , K2, K3 and N1 , shown in Figure 4.1 to 4.4, it is suggested that 
more than 15 times dimensionality sample functions may be required to avoid 
the effect of the Hughes phenomenon. 
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Table 4.3 Data Sets Used to Test the Occurrence of 
the Hughes Phenomenon : 


(a) Kansas September Data With Half Training Samples : 

Data Set K1H 


K1H 

Winter Wheat 

Summer Fallow 



Training 

70 

200 

140 

410 

IH&ilifeH 

71 

214 

137 

412 

Total 

141 

414 

277 

832 


(b) Kansas September Data With Quarter Training Samples : 

Data Set K1Q 


K1Q 

Winter Wheat 

Summer Fallow 

KrilkHiBsMffRUfnl 


laifeihiiiM 

35 

100 

70 

205 

Testing 

106 

314 

207 

627 1 

Totai 

141 

414 

277 

832 


(c) North Dakota June Data With Half Training Samples : 

Data Set N2H 


N2H 


Summer Fallow 

Natural Pasture 


Training 

400 

“ 150 

80 ~ n 

630 

Testing 

387 

141 

81 

609 

Total 

787 

291 

161 

1239 


(d) North Dakota June Data With Quarter Training Samples : 

Data Set N2Q 


N2Q 


Summer Fallow 

Natural Pasture 


Training 

200 

75 

40 

315 

Testing 

587 

216 

121 

924 

\MEEM 

787 

291 

161 

1239 
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Kansas September Data 
Half Training Samples 



Figure 4.17 First Experiment of the Hughes Phenomenon : 

Data Set K1H 
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Kansas September Data 
Quarter Training Samples 



Figure 4.18 Second Experiment of the Hughes Phenomenon : 

Data Set K1Q 
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North Dakota June Data 
Half Training Samples 



Number of Features 


Figure 4.19 Third Experiment of the Hughes Phenomenon : 

Data Set N2H 
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North Dakota June Data 
Quarter Training Samples 



Figure 4.20 Fourth Experiment of the Hughes Phenomenon : 

Data Set N2Q 
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4.4 Signal to Noise Ratio Considerations 

In the previous sections, the classification results obtained by using the 
spectral features developed in this research are presented for 100 dimensional 
FSS vegetation data and 200 dimensional Exotech-C soil data. It is found 
( referring to Figure 4.1 to 4.16 ) that about 10 to 1 compression ratio can be 
achieved while maintaining satisfactory classification accuracy. One question 
an Earth scientist user of the algorithm may have is that the 10 to 1 downlink 
data rate reduction is not at a severe cost to the usefulness of the data. Thus, in 
this section, we will discuss the data volume reduction issue from the Earth 
scientist point of view, that is, from signal-to-noise ratio considerations. 

Weighted Karhunen-Loeve transform rotates the original N-dimensional 
signal space to a more favorable orientation. This orientation is one in which 
the source energy is redistributed such that a larger percentage of the energy is 
distributed over fewer coordinates. Table 4.4 and Figure 4.21 show how the 
source energy is redistributed over the first 25 transformed coordinates for 100 
dimensional vegetation data set K2. 

In Table 4.4, the first row shows that the magnitude of the total source 
energy is 3497, which is the sum of all eigenvalues; Further, the mean square 
representation error (MSE) and percent mean square representation error 
(%MSE) are 3497 and 100% respectively if 'none' of the optimal feature is used 
to transform the data. The second row indicates that the magnitude of the first 
eigenvalue is 2779.8; If the first optimal feature is used to transform the data, 
the representation error and percent representation error will be 717 and 20.5% 
respectively, that is, the first transformed coordinate contains about 79.5% 
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source energy in it. Similarly, it can be found that using the first 2 optimal 
features, about 97.5% of the total source energy can be preserved, and using 
the first 10 optimal features to transform the data in the measurement space, 
the percent mean square representation error, that is 0.17%, is indeed 
negligible. Figure 4.21 shows graphically how fast the representation error can 
be reduced by using the first few optimal features. It should be noticed that the 
representation error (MSE) is plotted in logarithmic scale. 

The practical values of the signal to noise ratio in a typical remote 
sensing system are from 50 to 200 in most of the 0.4 to 2.5 pm spectrum range 
[1], This indicates that the maximal noise level in the system is only 1/50, that is, 
2%. Since using the first 10 optimal features derived from the Weighted K-L 
transform preserves almost all the signal energy in the original measurement 
space; Further, the representation error level is 0-17% which is much lower 
than the noise level in the system. Hence, the effect on the signal to noise ratio 
due to compression is quite limited even as the signal to noise ratio is down to 
20. Therefore, a data volume reduction by a factor of 10 is achieved with 
essentially no loss of information. 
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Table 4.4 Mean Square Representation Error for Data Set K2 
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Mean Square Representation Error 



Number of Optimal Features used 


Figure 4.21 Mean Square Representation Error for Data Set K2 
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CHAPTER V 

CONCLUSIONS AND RECOMMENDATIONS 
5.1 Conclusions 

The fundamental objective of this research is to develop an objective and 
practical spectral feature design technique for high dimensional multispectral 
data. In this thesis, four spectral feature design techniques have been 
developed. Two of them, non-overlapping band feature selection algorithm 
and overlapping band feature selection algorithm, are derived from the spectral 
dominancy concept of the optimal functions; the other two, Walsh function 
approach and infinite clipped optimal function approach, are derived from the 
spectral similarity concept of the optimal functions. These four approaches 
have been proved effective for data compression and classification purposes in 
high dimensional multispectral data. 

A comparison among these four techniques indicates that the infinite 
clipped optimal function approach is the best scheme since the features are 
easiest to find and their classification performance is the best under the same 
compression requirement. This technique approximates the spectral structure of 
the optimal features via infinite clipping and results in transform coefficients 
which are either +1, -1 or 0. Therefore the necessary processing can be easily 
implemented on-board the spacecraft by using a set of programmable adders 
that operate on the grouping instructions received from the ground station. 
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After the preprocessed data has been received, canonical analysis is 
further used to find the best set of features under the criterion that maxima! class 
separability is achieved 

Both vegetation and soil data have been tested in this research. For 
vegetation data, four sets of multitemporal multispectral vegetation data 
collected in Kansas, North Dakota, Iowa and South Dakota respectively with 9 
to 42 information classes in 1976 to 1979 are used to test the spectral feature 
design system. One spatially and temporally combined data set is also formed 
by combining the Kansas and North Dakota Data sets to test the robustness 
property of the scheme. The results indicate that the system is not overly 
sensitive to spatial and temporal variation. 

Furthermore, a soil data base collected by Eric Stoner in 1979 was also 
acquired and used to test the system. In this research, five different soil data 
sets grouped by the soil order, organic content #1, organic content #2, iron 
oxide content and soil texture are formed. The classification performances are 
then found. It is shown that soil order, organic content percentage, iron oxide 
content percentage and soil texture can be delineated and predicted by the 
proposed technique. 

It is concluded that the infinite clipped versions of the first 16 optimal 
functions derived from the Weighted Karhunen-Loeve Transform have excellent 
classification performance. Further signal processing by canonical analysis 
increases the compression ratio while retains the classification accuracy. The 
overall probability of correct classification of the proposed system is over 90% 
while providing for a reduced downlink data rate by a factor of 10. 
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5.2 Recommendations 

The spectral feature design system developed in this research has been 
demonstrated for the FSS vegetation data and the Exotech-C soil data. In the 
future, it is proposed to test AVIRIS and HIRIS data. The following procedure is 
recommended : 

(A) Pre-Flight Stage : 

(1) Collect enough representable samples from all reference sources 
available, for example, the field data base collected in the past, to 
form the ensemble of a specific problem (Ground Truth Gathering) 

(2) Calculate the mean vector and covariance matrix of this ensemble 

(3) Find the eigenvectors of the covariance matrix 

(4) Run the spectral feature design system on the ground to find the 
grouping coefficients ( either +1, -1, orO ) 

(B) On-Board Preprocessing Stage : 

(5) Send up these grouping coefficients ( instructions ) to the 
spacecraft for on-board data preprocessing 

(C) Post-Flight Stage : 

(6) Receive the preprocessed low dimensional data 

(7) Run the spectral feature design system on the ground to find the 
canonical features 

(8) Use these canonical features to further transform the received data 
into the final signal space where the data classification is 
performed 



98 


In this procedure, there are basically 3 processing stages involved : 
pre-flight stage, on-board preprocessing stage and post-flight stage. The pre- 
flight stage, which consists of step 1 to step 4, is used to gather ground truth 
information, estimate ensemble statistics and find appropriate grouping 
coefficients from one of the four developed schemes. This stage would be done 
before the data take by the aids of aerial photography, topographical maps, 
historical information, field data base collected in the past or other reference 
sources available. One more comment about this stage is the problem of the 
sample size, it is suggested from the experience in this research that the total 
number of samples used to estimate the ensemble statistics needs to be at least 
15 times their signal dimensionality in order to accurately estimate the 
covariance matrix. 

The second stage, on-board preprocessing stage, which contains step 5, 
performs band groupings on board the spacecraft, either summing (+1), 
subtracting (-1) or omitting ( 0 ) bands for each spectral function according to the 
grouping instructions sent by the ground user. Since this data preprocessing 
stage would be done on board the spacecraft, from implementation point of 
view, the algorithm simplicity is then required and important. The spectral 
feature design system developed in this research makes this simplicity possible. 
Figure 1.1 shows how the data preprocessing can be implemented on board 
the spacecraft by a set of programmable adders. 

Finally, the post-flight stage, which includes step 6 to step 8, is applied 
to further process the received transformed data such that the maximal class 
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separability is achieved. Since this stage and the pre-flight stage would be 
done at the ground station, the algorithm simplicity is therefore less important 
than that in the on-board preprocessing stage. Hence, it might be more 
effective to use the overlapping band feature selection algorithm to design the 
features in some future situations although it's the most complex among the 
four techniques developed in this research. 
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Appendix A IBM 3083 Macro File 


/* RUN A FORTRAN PROGRAM USING IMSLSP OR IMSLDP SUBROUTINES */ 
ARG FN FN1 FN2 FN3 FN4 FN5 FN6 FN7 FN8 FN9 FN10 FN11 
LINKTO IMSL 

GLOBAL TXTLIB IMSLSP IMSLDP PFORTLIB VSF2FORT CMSLIB 
GLOBAL LOADLIB VSF2LOAD 
FORTVS2 FN 
LOAD FN 

FILEDEF 11 DISK FN1 DATA Cl 
FILEDEF 12 DISK FN2 DATA Cl 
FILEDEF 13 DISK FN3 DATA Cl 
FILEDEF 14 DISK FN4 DATA Cl 
FILEDEF 15 DISK FN5 DATA Cl 
FILEDEF 16 DISK FN6 DATA Cl 
FILEDEF 17 DISK FN7 DATA Cl 
FILEDEF 18 DISK FN8 DATA Cl 
FILEDEF 19 DISK FN9 DATA Cl 
FILEDEF 20 DISK FN10 DATA Cl 
FILEDEF 21 DISK FN11 DATA Cl 
START 
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Appendix B Spectral Feature Design System - Program Listing 


PROGRAM MV 

PARAMETER (NP2=1 551, NPl=100,NP3=NPl* (NP1+1) /2,NF2=10,NF3=5) 

REAL X(NP2,NP1) ,XM(NP1) ,VCV (NP3) 

DATA IFLAG1,XM, VCV/0,NP1*0 . 0, NP3*0 . 0/ 

NP1 : DIMENSIONALITY OF SAMPLE FUNCTIONS 

NP2 : TOTAL NUMBER OF SAMPLE FUNCTIONS 

NP3 : TOTAL NUMBER OF ELEMENTS IN COVARIANCE MATRIX VC V 

NF2 : RAW DATA INPUT FILE STORED IN FORMAT 10F8.3 

NF3 : XM & VCV OUTPUT DATA FILE STORED IN FORMAT 5E15.7 

X : RAW DATA ( INPUT ) 

XM : MEAN VECTOR ( OUTPUT ) 

VCV : COVARIANCE MATRIX STORED IN SYMMETRIC MODE { OUTPUT ) 

IFLAG1 INTERNAL CHECKING PARAMETER 

11 = DATA FILE ; 12 = MV FILE 

OPEN (11) 

OPEN (12) 

REWIND 11 
REWIND 12 

READ IN RAW DATA AND PRINT THE PROGRESS FOR EVERY 100 SAMPLES 

DO 20 ISAMP=1,NP2 
K=MOD (ISAMP, 100) 

IF (K.EQ.0) PRINT*, ' NP2 = ',NP2,' ; ISAMP = ISAMP 
DO 20 1=1, NP1/NF2 

20 READ (11,1) (X (ISAMP, J) , J=l+ ( I —1 ) *NF2, I*NF2) 

PRINT*, ' DATA READ IN FINISHED ' 

1 FORMAT (10F8. 3) 

FIND THE ENSEMBLE MEAN VECTOR 

DO 30 J=l, NP1 
DO 30 1=1, NP2 
30 XM(J)=XM(J)+X(I, J) 

DO 40 1=1, NP1 
40 XM(I)=XM(I) /FLOAT (NP2) 

PRINT*, ' MEAN VECTOR FOUND ' 

FIND THE ENSEMBLE COVARIANCE MATRIX AND PRINT THE PROGRESS FOR 
EVERY 10 DIMENSIONS 

DO 50 1=1, NP1 
KX=MOD (1,10) 

IF (KX, EQ. 0) PRINT* , I 
DO 50 J=1,I 
IND= (1-1) *1/2+ J 


mv 

00010 

MV 

00020 

MV 

00030 

MV 

00040 

MV 

00050 

MV 

00060 

MV 

00070 

MV 

00080 

MV 

00090 

MV 

00100 

MV 

00110 

MV 

00120 

MV 

00130 

MV 

00140 

MV 

00150 

MV 

00160 

MV 

00170 

MV 

00180 

MV 

00190 

MV 

00200 

MV 

00210 

MV 

00220 

MV 

00230 

MV 

00240 

MV 

00250 

MV 

00260 

MV 

00270 

MV 

00280 

MV 

00290 

MV 

00300 

MV 

00310 

MV 

00320 

MV 

00330 

MV 

00340 

MV 

00350 

MV 

00360 

MV 

00370 

MV 

00380 

MV 

00390 

MV 

00400 

MV 

00410 

MV 

00420 

MV 

00430 

MV 

00440 

MV 

00450 

MV 

00460 

MV 

00470 

MV 

00480 

MV 

00490 

MV 

00500 
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DO 50 K=l, NP2 

50 VCV (IND) =VCV (IND) + (X (K, I) *X (K, J) -XM (I) *XM ( J) ) 

DO 60 1=1, NP 3 

60 VCV(I) =VCV (I) /FLOAT (NP2-1) 

PRINT*, ' COV. MATRIX FOUND ’ 

C 

C INTERNAL CHECKING FOR ALGORITHM ACCURACY 
C 

DO 80 1=1, NP1 
IND=I* (1+1) /2 

IF (VCV ( IND ) . LT .0.0) GO TO 70 
GO TO 80 

70 WRITE (*, 2) I, VCV (IND) 

2 FORMAT (’ACCURACY OF ALGORITHM IS POOR AT I =’,I5, 

+’ WHERE VCV(I,I) = 1 , E15 . 7 ) 

VCV(I)=-VCV(I) 

IFLAG1=IFLAG1+1 
80 CONTINUE 
C 

C PRINT THE COMMENTS FOR ACCURACY 

C 

IF (IFLAG1 . GT . 0 ) GO TO 90 

PRINT*, ’ POSITIVE VARIANCES CHECK DONE ’ 

WRITE {*, 4) 

GO TO 100 

90 WRITE (*, 3) IFLAG1 

3 FORMAT (’ THERE ARE ’,15,’ VARIANCES LESS THAN 0.0 ’) 

4 FORMAT (’ ALL VARIANCES ARE ">= 0.0", ACCURACY IS GOOD’) 
C 

C SEND THE RESULTS TO OUTPUT DATA FILE 

C 

100 DO 110 I=1,NP1/NF3 

110 WRITE (12, 5) (XM(J), J=1+(I-1)*NF3,I*NF3) 

5 FORMAT ( 5E1 5 . 7 ) 

DO 120 1=1, NP3/NF3 

120 WRITE (12, 5) (VCV ( J) , J=l+ (1-1 ) *NF3, I*NF3) 

STOP 

END 


MV 00510 
MV 00520 
MV 00530 
MV 00540 
MV 00550 
MV 00560 
MV 00570 
MV 00580 
MV 00590 
MV 00600 
MV 00610 
MV 00620 
MV 00630 
MV 00640 
MV 00650 
MV 00660 
MV 00670 
MV 00680 
MV 00690 
MV 00700 
MV 00710 
MV 00720 
MV 00730 
MV 00740 
MV 00750 
MV 00760 
MV 00770 
MV 00780 
MV 00790 
MV 00800 
MV 00810 
MV 00820 
MV 00830 
MV 00840 
MV 00850 
MV 00860 
MV 00870 
MV 00880 


PROGRAM EV 

PARAMETER (NP1=100,NP3=NP1* (NP1+1) /2,NP5=NP3+NP1, 
+NF2=10,NF3=5) 

REAL XM(NPl) , VCV (NP3) , VCVF (NP1,NP1) , D (NP1) , 

+Z (NP1, NP1 ) , WK2 (NP5) 

PFftT TRAPP 1 RTTM 

DATA JOB2, IFLAG1, SUM, TRACE/2, 0, 2*0 . 0/ 

C 


c 

NP1 

RAW DATA DIMENSIONALITY 

c 

NP3 

TOTAL NUMBER OF ELEMENTS FOR VCV 

c 

NP5 

DIMENSION FOR PERFORMANCE INDEX MATRIX WK2 

c 

c 

XM 

MEAN VECTOR 



c 

VCV 

COVARIANCE MATRIX 

( 

SYMMETRIC STORAGE MODE 

c 

VCVF 

COVARIANCE MATRIX 

( 

FULL STORAGE MODE ) 

c 

D 

EIGENVALUE 




EV 00010 
EV 00020 
EV 00030 
EV 00040 
EV 00050 
EV 00060 
EV 00070 
EV 00080 
EV 00090 
EV 00100 
EV 00110 
EV 00120 
EV 00130 
EV 00140 
EV 00150 
EV 00160 
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Z : EIGENVECTOR EV 

WK2 : PERFORMANCE INDEX MATRIX EV 

EV 

11 : INPUT MV FILE ; 12 : OUTPUT EV FILE EV 

EV 

OPEN (11) EV 

OPEN (12) EV 

REWIND 11 EV 

REWIND 12 EV 

EV 

READ IN MEAN VECTOR AND COVARIANCE MATRIX EV 

EV 

DO 10 1=1, NP1/NF3 EV 

10 READ (11, *) (XM(J) , J=l+ (1-1) *NF3, I*NF3) EV 

1 FORMAT (5E15. 7) EV 

DO 20 I=1,NP3/NF3 EV 

20 READ (11, *) (VCV (J) , J=l+ (I— 1 ) *NF3, I*NF3) EV 

CALL VCVTSF (VCV, NP1, VCVF, NP1) EV 

EV 

FIND TRACE, EIGENVALUES AND EIGENVECTORS OF THE COVARIANCE MATRIX EV 

EV 

DO 30 1=1, NP1 EV 

30 TRACE=TRACE+VCVF(I, I) EV 

CALL EIGRS (VCV,NPl, JOB2, D, Z, NP1, WK2, IER) EV 

EV 

PRINT THE PERFORMANCE INDEX AND ACCURACY COMMENTS EV 

EV 

IF ( IER . NE . 0 . OR . WK2 ( 1 ) . GE . 1 . 0 ) GO TO 40 EV 

WRITE (*, 3) IER,WK2 (1) EV 

GO TO 50 EV 

40 WRITE (*, 2) IER,WK2 (1) EV 

2 FORMAT (' PERFORMANCE OF "EIGRS" IS POOR, IER =',I5, EV 

+' WK2 (1) =' ,E15. 7) EV 

3 FORMAT (' PERFORMANCE OF "EIGRS" IS GOOD, IER =',I5, EV 

+' WK2 (1) = ' , E15 . 7) EV 

EV 

INTERNAL CHECKING FOR ACCURACY EV 

EV 

50 DO 70 1=1, NP1 EV 

IF (D (I) ,LE . 0 . 0) GO TO 60 EV 

GO TO 70 EV 

60 WRITE ( * , 4 ) I , D ( I ) EV 

4 FORMAT (' EIGEN VALUE IS "< = 0.0" AT I =', 15, EV 

+' WHERE D (I) =' , E15. 7) EV 

IFLAG1=IFLAG1+1 EV 

70 CONTINUE EV 

IF (IFLAG1 .GT. 0) GO TO 80 EV 

WRITE (*, 6) EV 

GO TO 90 EV 

80 WRITE (*, 5) IFLAG1 EV 

5 FORMAT (' THERE ARE', I 5,' NEGATIVE OR ZERO EIGEN VALUES ') EV 

6 FORMAT ( ' ALL EIGEN VALUES ARE GREATER THAN ZERO ' ) EV 

EV 

FIND THE SUM OF THE EIGENVALUES AND PRINT THE ACCURACY COMMENTS EV 

EV 

90 CALL VABSMF (D,NP1, 1, SUM) EV 

IF (ABS (TRACE-SUM) .GT.1.0E-1 ) GO TO 100 EV 


00170 

00180 

00190 

00200 

00210 

00220 

00230 

00240 

00250 

00260 

00270 

00280 

00290 

00300 

00310 

00320 

00330 

00340 

00350 

00360 

00370 

00380 

00390 

00400 

00410 

00420 

00430 

00440 

00450 

00460 

00470 

00480 

00490 

00500 

00510 

00520 

00530 

00540 

00550 

00560 

00570 

00580 

00590 

00600 

00610 

00620 

00630 

00640 

00650 

00660 

00670 

00680 

00690 

00700 

00710 

00720 

00730 
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WRITE (*, 8 ) TRACE , SUM 
GO TO 110 

100 WRITE (*, 7) TRACE, SUM 

7 FORMAT (' ACCURACY OF "EIGRS" IS POOR, TRACE =',E15.7, 
+ ' SUM = ' , E15 .7) 

8 FORMAT (' ACCURACY OF "EIGRS" IS GOOD, TRACE =',E15.7, 
+ ' SUM =' ,E15. 7) 

SEND THE RESULTS TO THE OUTPUT DATA FILE 

110 WRITE (12, 9) TRACE, SUM 

9 FORMAT ( 2E1 5 • 7 ) 

DO 120 I=1,NP1/NF3 

120 WRITE (12,1) (D (NP1+1-J) , J=l + (1-1) *NF3, I*NF3) 

DO 130 J=l, NP1 
DO 130 1=1, NP1/NF3 

130 WRITE (12,1) (2 (K, NP1+1-J) , K=l+ (1-1 ) *NF3, I*NF3) 

STOP 

END 


EV 00740 
EV 00750 
EV 00760 
EV 00770 
EV 00780 
EV 00790 
EV 00800 
EV 00810 
EV 00820 
EV 00830 
EV 00840 
EV 00850 
EV 00860 
EV 00870 
EV 00880 
EV 00890 
EV 00900 
EV 00910 
EV 00920 


C Z 

C 

C 11 : INPUT EIGENVECTOR FILE; 12 

C 

OPEN (11) 

OPEN (12) 

REWIND 11 
REWIND 12 
READ (11,*) XI, X2 
DO 10 1=1, NP1/5 
10 READ (11, *) XI, X2, X3, X4, X5 


00010 
00020 
00030 
00040 
00050 
00060 
00070 
00080 
00090 
00100 
00110 
00120 
00130 
00140 
00150 
00160 
00170 
00180 
00190 
00200 
BS 00210 
BS 00220 
BS 00230 

: OUTPUT N.O.L. BAND FILE BS 00240 

BS 00250 
BS 00260 
BS 00270 
BS 00280 
BS 00290 
BS 00300 
BS 00310 
BS 00320 
BS 00330 


PROGRAM NOLBS BS 

PARAMETER (NP1=100, NTERM=6, NV=50, NZl=NPl*NV, Nl=l, N2=100) BS 

C BS 

C FOR FSS VEGETATION DATA : N1 = 1; N2 = 100 BS 

C FOR SOIL DATA : N1 = 4; N2 = 192 BS 

C BS 

C FOR SOIL DATA ( FROM EFFECTIVE WAVELENGTH 0.52 TO 2.32UM;180 DIM ) BS 
C Nl=l, N2=180 BS 

C BS 

REAL X (NP1 , NTERM) , AVE (NP1 ) , SI (NP1 ) , 2 (NP1 , NV) BS 

DATA Z/NZ1*0 . 0/ BS 

C BS 

C NP1 : RAW DATA DIMENSIONALITY BS 

C NTERM : TOTAL NUMBER OF OPTIMAL FUNCTIONS USED IN THE ALGORITHM BS 

C NV : PRESET MAX NUMBER OF N.O.L. BANDS, INCREASE IT IF NEEDED BS 

C N1 : THE STARTING WAVELENGTH POINT BS 

C N2 : THE ENDING WAVELENGTH POINT BS 

C BS 

C X : EIGENVECTOR ( INPUT ) BS 

C AVE : AVERAGE OF THE FIRST 'NTERM' EIGENVECTORS BS 

C SI : SIGNED VERSION OF AVE(NPl) 

DESIRED N.O.L. BAND FEATURES ( OUTPUT ) 
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C READ IN EIGENVECTORS 

C 

DO 20 ITERM=1 , NTERM 
DO 20 J=l,NPl/5 

20 READ (11,*) (X (I, ITERM) , 1=1+ (J-l) *5, J*5) 

C 

C FIND THE AVERAGE OF THE FIRST 'NTERM' EIGENVECTORS AND 
C ITS SIGNED VERSION 

C 

DO 40 J=l, NP1 

AVE ( J) =0 . 0 

DO 30 ITERM=1, NTERM 

30 AVE ( J) =AVE ( J) +X ( J, ITERM) /FLOAT (NTERM) 

IF (NP1 .NE . 100) GO TO 35 
IF ( J. GE . 45 . AND . J. LE . 54 ) AVE ( J) =0 . 0 
IF ( J. GE . 70 . AND . J. LE . 79 ) AVE ( J) =0 . 0 
35 IF (AVE (J) .LT.0.0)S1 (J)=-1.0 
IF (AVE (J) . GT. 0 . 0) SI (J)=1.0 
IF (AVE (J) . EQ. 0 . 0) SI ( J) =0 . 0 
40 CONTINUE 
C 

C THE NEXT 3 LINES CAN BE USED TO PLOT AVE (I) AND SI (I) 

C 

C DO 50 1=1, NP1 
C 50 WRITE (12, 51) AVE (I) , I, SI (I) 

C 51 FORMAT (E15 . 7, 15, F5 . 0) 

IVEC=1 

Z (Nl, IVEC) =ABS (SI (Nl) ) 

C 

C FIND N.O.L. BAND FEATURES FROM SI 
C 

DO 60 I=N1+1,N2 
IF (NP1 .NE. 100) GO TO 55 
IF (I .GE . 45 .AND . I . LE . 54 ) GO TO 60 
IF (I . GE .70. AND . I . LE . 7 9 ) GO TO 60 
55 IF (SI (1-1) .NE.S1 (I) ) IVEC=IVEC+1 
WRITE (12,*) I, IVEC 
IF (IVEC , GE .NV) GO TO 120 
Z (I, IVEC) =ABS (SI (I) ) 

60 CONTINUE 
C 

C NORMALIZE THE FEATURES AND SEND THEM TO THE OUTPUT FILE 
C 

DO 100 J=1 , IVEC 
XN1=0 . 0 
DO 70 1=1, NP1 
70 XN1=XN1+Z (I, J) *Z (I, J) 

DO 80 1=1, NP1 
80 Z(I, J)=Z(I, J)/SQRT(XN1) 

DO 90 Il=l,NPl/5 

90 WRITE (12,91) (Z (I, J) , 1=1+ (11-1) *5, 11*5) 

91 FORMAT (5E15. 7) 

100 CONTINUE 

120 PRINT*,' TOTAL NUMBER OF N.O.L. BAND FEATURES =',IVEC 
STOP 
END 


BS 00340 
BS 00350 
BS 00360 
BS 00370 
BS 00380 
BS 00390 
BS 00400 
BS 00410 
BS 00420 
BS 00430 
BS 00440 
BS 00450 
BS 00460 
BS 00470 
BS 00480 
BS 00490 
BS 00500 
BS 00510 
BS 00520 
BS 00530 
BS 00540 
BS 00550 
BS 00560 
BS 00570 
BS 00580 
BS 00590 
BS 00600 
BS 00610 
BS 00620 
BS 00630 
BS 00640 
BS 00650 
BS 00660 
BS 00670 
BS 00680 
BS 00690 
BS 00700 
BS 00710 
BS 00720 
BS 00730 
BS 00740 
BS 00750 
BS 00760 
BS 00770 
BS 00780 
BS 00790 
BS 00800 
BS 00810 
BS 00820 
BS 00830 
BS 00840 
BS 00850 
BS 00860 
BS 00870 
BS 00880 
BS 00890 
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PROGRAM WALSH 

THIS PROGRAM IS USED TO GENERATE THE FIRST 64 100-DIM. WALSH FUN. 
IN THIS PROGRAM WE SET W1=0 . 1 AND W2=-0.1 SUCH THAT NORM (W) =1^0 
NP1 = 100, M = 6 , NF4 = 5 USED FOR 64 100-DIM WALSH FUNCTIONS 

PARAMETER (NP1=100,M=6,NTVEC=2**M,NMAX=2** (M-l) , 

+W1=0 . 1, W2=-0 . l,NF4=5,NP5=NPl/2, NP6=NP1/ 4) 

REAL Z (NP1,NTVEC) , ZW1 (NP1,NMAX) , ZW2 (NPl,NMAX) 

INTEGER NZERO (NTVEC) 


NP1 

M 

NTVEC 

W1 

W2 

NF4 

Z 

ZW1 

ZW2 

NZERO 


DIMENSIONALITY OF WALSH FUNCTION 
TOTAL NUMBER OF WALSH FUNCTIONS IS 2**M 
TOTAL NUMBER OF WALSH FUNCTIONS 

THE NORMALIZED LENGTH OF 10 0-DIM. WALSH FUNCTION 
THE NEGATIVE OF W1 
OUTPUT FORMAT USE 

RESULTS OF WALSH FUNCTIONS ( OUTPUT ) 

INTERMEDIATE MATRIX FOR WALSH FUNCTION GENERATION 
INTERMEDIATE MATRIX FOR WALSH FUNCTION GENERATION 
CHECKING VECTOR FOR AXIS CROSSINGS OF WALSH FUNCTIONS 


SET UP THE FIRST 4 WALSH FUNCTIONS 

DATA ( (Z (I, J),I=1,NP1), J=1,4)/NP1*W1, NP5*W1, NP5*W2, 

+NP6*W1,NP5*W2,NP6*W1, NP6*W1,NP6*W2,NP6*W1,NP6*W2/ 

OPEN (11) 

REWIND 11 

STORE THE THIRD AND FOURTH WALSH FUNCTIONS 

DO 10 J=l, 2 
DO 10 1=1, NP1 
10 ZW1(I, J)=Z(I,2+J) 

PRINT*, 'IM = 0,1,2, SEQ : Z (I, 1) , Z (I, 2) , ZW1 (I, 1) , ZW1 (I, 2) ' 
DO 20 1=1, NP1 

20 WRITE (*,*)I,Z(I,1),Z(I,2), ZW1 (1,1) , ZW1 (1,2) 

GENERATE THE FIRST 2**M WALSH FUNCTIONS 

DO 70 IM=3,M 
K=2** (IM-1) 

DO 30 IK=1 , K-l , 2 

IKM= (IK+1) /2 

DO 30 1=1, NP5 

ZW2 (I, IK) =ZW1 (2*1, IKM) 

30 ZW2 (NP5+I, IK) = ( (-1 . ) ** (IKM+1) ) *ZW1 (2*1, IKM) 

DO 40 IK=2,K, 2 

IKM=IK/2 

DO 40 1=1, NP5 

ZW2 (I, IK) =ZW1 (2*1, IKM) 

40 ZW2 (NP5+I, IK) = { (-1 . ) ** (IKM) ) *ZW1 (2*1, IKM) 

DO 50 IK=1,K 
DO 50 1=1, NP1 
Z (I, K+IK) =ZW2 (I, IK) 

50 ZW1 (I, IK) =ZW2 (I, IK) 


WAL00010 

WAL00020 

WAL00030 

WAL00040 

WAL00050 

WAL00060 

WAL00070 

WAL00080 

WAL00090 

WAL00100 

WAL00110 

WAL00120 

WAL00130 

WAL00140 

WAL00150 

WAL00160 

WAL00170 

WAL00180 

WAL00190 

WAL00200 

WAL00210 

WAL00220 

WAL00230 

WAL00240 

WAL00250 

WAL00260 

WAL00270 

WAL00280 

WAL00290 

WAL00300 

WAL00310 

WAL00320 

WAL00330 

WAL00340 

WAL00350 

WAL00360 

WAL00370 

WAL00380 

WAL00390 

WAL00400 

WAL00410 

WAL00420 

WAL00430 

WAL00440 

WAL00450 

WAL00460 

WAL00470 

WAL00480 

WAL00490 

WAL00500 

WAL00510 

WAL00520 

WAJh00530 

WAL00540 

WAL00550 

WAL00560 

WAL00570 
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IF (IM.GE. 6) GO TO 70 
WRITE (*, 1) IM,K 

1 FORMAT (' IM = ',12,', THE SEQ IS ZW2(I,J), J=1,K=\I3) 

DO 60 1=1, NP1 

60 WRITE (*, 3)1, (ZW2 (I, J) , J=l» K) 

3 FORMAT (14, 2X,16F4.1) 

70 CONTINUE 

C 

C CHECK TOTAL NUMBER OF AXIS CROSSINGS FOR EACH WALSH FUNCTIONS 
C 

DO 80 J=1,NTVEC 
DO 80 I=1,NP1-1 

IF (Z (I, J) .NE. Z (1+1, J) ) NZERO ( J) =NZERO{ J) +1 
80 CONTINUE 
C 

C THE FOLLOWING 2 STATEMENTS CAN BE USED FOR INTERNAL CHECKING 

C 

C DO 85 11=1 , NTVEC/ 8 

C 85 WRITE (11, 86) (NZERO ( J) , J=l+ (11-1) *8, 11*8) 

86 FORMAT (818) 

C WRITE (*,*) (NZERO (J) ,J=1, NTVEC) 

DO 90 J=l, NTVEC 
IF (NZERO (J) .NE. (J-l) ) GO TO 200 
90 CONTINUE 
C 

C SEND THE RESULTS TO OUTPUT FILE 
C 

DO 140 J=l, NTVEC 
DO 140 K=1 , NP1/NF 4 

140 WRITE (11, 4) (Z (I, J) ,1=1+ (K-l) *NF4,K*NF4) 

C 

C CHOOSE FORMAT (10F8.1) IF NF4=10 INSTEAD OF 5 
C 

C 4 FORMAT (10F8.1) 

4 FORMAT ( 5E1 5 . 7 ) 

200 STOP 

END 


WAL00580 

WAL00590 

WAL00600 

WAL00610 

WAL00620 

WAL00630 

WAL00640 

WAL00650 

WAL00660 

WAL00670 

WAL00680 

WAL00690 

WAL00700 

WAL00710 

WAL00720 

WAL00730 

WAL00740 

WAL00750 

WAL00760 

WAL00770 

WAL00780 

WAL00790 

WAL00800 

WAL00810 

WAL00820 

WAL00830 

WAL00840 

WAL00850 

WAL00860 

WAL00870 

WAL00880 

WAL00890 

WAL00900 

WAL00910 

WAL00920 

WAL00930 

WAL00940 


PROGRAM INFCLIP INF00010 

PARAMETER (NP1=1 00 ,NTERM=1 6, IEV=1) INF00020 

REAL X(NP1) INF00030 

c INF00040 

C NP1 : RAW DATA DIMENSIONALITY INF00050 

C NTERMS : TOTAL NUMBER OF OPTIMAL FUNCTIONS USED IN THE ALGORITHMINF00060 

C x : INPUT AND OUTPUT VARIABLE INF00070 

c INF00080 

C IEV : INPUT FILE READING INDEX ( CHOOSE EITHER 1 OR 0 ) INF00090 

C IEV = 1 IF INPUT FILE CONTAINS TRACE, EIGENVALUES AND THEIR SUM INF00100 

C IEV = 0 IF INPUT FILE CONTAINS ONLY EIGENVECTORS INF00110 

c INF00120 

r INF00130 


11 : INPUT EV FILE; 12 : OUTPUT INF. CLIPPED OPT. FEATURE FILE INF00140 

INF00150 
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OPEN (11) 

OPEN (12) 

REWIND 11 
REWIND 12 

FIND NORMALIZATION FACTOR 

IF (NP1 . EQ. 100)XNP1=FLOAT (NP1-20) 

IF (NP1 . EQ . 200 ) XNP1=FL0AT (NP1 ) 

READ INPUT EIGENVECTORS FOR TWO POSSIBLE CASES 

IF (IEV.EQ. 0) GO TO 15 
READ (11,*) XI, X2 
DO 10 I=l,NPl/5 
10 READ (11,*)X1,X2,X3,X4,X5 

FIND INFINITE CLIPPED VERSION FOR EVERY OPTIMAL FUNCTION 

15 DO 50 ITERM=1 , NTERM 
DO 20 J=l,NPl/5 

20 READ (11,*) (X(I) ,I=1+(J-1)*5, J*5) 

XN1=1 . / SQRT (XNPl ) 

DO 30 J=l, NP1 

IF (NP1 . EQ . 100 . AND . J . GE . 4 5 . AND . J . LE . 54 ) X ( J) =0 . 0 
IF (NP1.EQ.100.AND. J.GE.7Q.AND. J.LE . 79) X ( J) =0 . 0 
IF (NP1 . EQ . 200 . AND . J . GE . 1 . AND . J. LE . 3 ) X ( J) =0 . 0 
IF (NP1.EQ.200.AND. J.GE. 193. AND. J.LE. 200) X(J) =0.0 
IF (X ( J) . GT . 0 . 0 ) X ( J) =XN1 
IF (X ( J) . LT . 0 . 0 ) X ( J) =-XNl 
30 CONTINUE 

SEND THE RESULT TO THE OUTPUT FILE 
DO 40 J=l,NPl/5 

40 WRITE (12, 41) (X (I) , 1=1+ ( J-l) *5, J*5) 

41 FORMAT (5E15. 7) 

50 CONTINUE 

STOP 

END 


INF00160 

INF00170 

INFQ0180 

INF00190 

INF00200 

1NF00210 

INF00220 

INF00230 

INF00240 

INF00250 

INF00260 

INF00270 

INF00280 

INF00290 

INF00300 

INF00310 

INF00320 

INF00330 

INF00340 

INF00350 

INF00360 

INF00370 

INF00380 

INF00390 

INF00400 

INF00410 

INF00420 

INF00430 

INF00440 

INF00450 

INF00460 

INF00470 

INF00480 

INF00490 

INF00500 

INF00510 

INF00520 

INF00530 

INF00540 

INF00550 


PROGRAM OLBS OLBO ?2i2 

PARAMETER (NP1=100, NTERM=6, NV=120, NZ1=2*NV,NZ2=NP1*NV, OLB00020 

+N1=1, N2=100, W1=0 .40, DW=0 . 02, NVX=40, NV2=NV*NV) OLB00030 

REAL X (NP1, NTERM) , SI (NP1) ,Z (NP1,NV) ,T1 (NV) , OLB00040 

+TEST (NP1 , NV) , A (NP1 , NVX) OLB00050 

INTEGER NX (NTERM) ,NEDGE (2, NV) ,NWID (NV) ,NRANK(NV) ,NREP (NV) , OLB00060 

+MREP (NV) OLB00070 

DATA NX, Z, NEDGE, NWID/NTERM*0, NZ2*0 . 0, NZ1*0, NV*0/ OLB00080 

DATA NREP, TEST/NV*1, NZ2*0 . 0/ OLB00090 

OLB00100 

NP1 : RAW DATA DIMENSIONALITY OLBOOllO 
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c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 


c 

c 

c 


c 

c 

c 


NTERM 

NV 

N1 

N2 

W1 

DW 

NVX 

X 

SI 

z 

T1 

TEST 

A 

NX (K) 

NEDGE 

NWID 

NRANK 

NREP 

MREP 

NREP = 
MREP = 


TOTAL NUMBER OF OPTIMAL FUNCTIONS USED IN THE ALGORITHM OLB00120 


PRESET TOTAL NUMBER OF L.D. BANDS, 
STARTING WAVELENGTH POINT 
ENDING WAVELENGTH POINT 
STARTING WAVELENGTH IN MICRO METER 
SPECTRAL RESOLUTION ( UM ) 

PRESET TOTAL NUMBER OF L.I. BANDS, 


INCREASE IT IF NEEDEDOLB00130 

OLB00140 

OLB00150 

( UM ) OLB00160 

OLB00170 

INCREASE IT IF NEEDEDOLB00180 

OLB00190 


INPUT EIGENVECTOR MATRIX 
SIGNED VERSION OF THE EIGENVECTOR 
L.D. BAND FEATURES 
TEMPORARY STORAGE VECTOR 

OUTPUT 0. L . BAND FEATURES ( L.I. FEATURES ) 
INTERMEDIATE MATRIX FOR RANK TEST 

TOTAL NO. OF L.D. BANDS FOR THE FIRST K EIGENVECTOR (S) 
BAND EDGES FOR EACH L.D. BANDS 
BAND WIDTH FOR EACH L.D. BANDS 
POSITIONS OF THE RANKED FEATURES BY THE WIDTHS 
INDEX SHOWS IF THE L.D. BANDS ARE REPEATED 
INDEX SHOWS IF THE BANDS ARE L.I. BANDS 

1 IF NON-REPEATED BAND ; NREP = 0 IF REPEATED 
1 IF L.I. BAND ; MREP = 0 IF L.D. 


11 

12 

13 


INPUT EIGENVECTOR FILE 

FIRST OUTPUT FILE L.D. AND L.I, 

SECOND OUTPUT FILE DESIRED O.L. 


BAND INFORMATION 
BAND FEATURE 


10 


20 


OPEN (11) 

OPEN (12) 

OPEN (13) 

REWIND 11 
REWIND 12 
REWIND 13 

READ IN EIGENVECTORS 

READ (11,*) XI, X2 

DO 10 I=l,NPl/5 

READ (11,*)X1,X2, X3, X4, X5 

DO 20 J=1 , NTERM 

DO 20 I=l,NPl/5 

READ (11, *) (X(K, J) ,K=1+ (1-1) 


'5,1*5) 


40 


FIND THE L.D, BAND FEATURES FROM FIRST 'NTERM' OPTIMAL FUNCTIONS 
IVEC=1 

DO 70 J=l, NTERM 
DO 40 1=1, NP1 

IF(X(I, J) .LT. 0.0) SI (I) =-1.0 

IF (X (I, J) .GT. 0.0) SI (I) =+1.0 

IF (X (I, J) .EQ. 0 . 0) SI (I) =0 . 0 

IF (NP1 .NE . 100) GO TO 40 

IF (I. GE. 45. AND. I. LE. 54) SI (I) =0.0 

IF (I. GE. 70. AND. I. LE. 79) SI (I) =0.0 

CONTINUE 

Z (N1 , IVEC) =ABS (SI (N1 ) ) 


OLB00200 

OLB00210 

OLB00220 

OLB00230 

OLB00240 

OLB00250 

OLB00260 

OLB00270 

OLB00280 

OLB00290 

OLB00300 

OLB00310 

OLB00320 

OLB00330 

OLB00340 

OLB00350 

OLB00360 

OLB00370 

OLB00380 

OLB00390 

OLB00400 

OLB00410 

OLB00420 

OLB00430 

OLB00440 

OLB00450 

OLB00460 

OLB00470 

OLB00480 

OLB00490 

OLB00500 

OLB00510 

OLB00520 

OLB00530 

OLB00540 

OLB00550 

OLB00560 

OLB00570 

OLB00580 

OLB00590 

OLB00600 

OLB00610 

OLB00620 

OLB00630 

OLB00640 

OLB00650 

OLB00660 

OLB00670 

OLB00680 
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DO 60 I=N1+1,N2 
IF (NP1 .NE. 100) GO TO 50 
IF (I . GE . 45 .AND . I . LE . 54 ) GO TO 60 
IF ( I . GE . 7 0 . AND . I . LE . 7 9 ) GO TO 60 
50 IF(S1(I-1) .NE.S1 (I) )IVEC=IVEC+1 
IF (IVEC.GT.NV) GO TO 350 
Z (I, IVEC) =ABS (SI (I) ) 

60 CONTINUE 
NX ( J) =IVEC 
IVEC=I VEC+ 1 
70 CONTINUE 

FIND THE BAND EDGES AND BAND WIDTH FOR EACH L.D. BAND FEATURES 

NVTOT=NX (NTERM) 

DO 90 J=1 , NVTOT 
11=0 
12=0 

DO 80 1=1, NP1 
CK1=Z (I, J) 

IF (CK1 . EQ . 0 . 0 ) GO TO 80 
IF (CK1. NE. 0.0. AND. II. EQ. 0)11=1 
IF (CK1 . NE . 0 . 0 .AND . II .NE . 0) 12=1 
80 CONTINUE 

IF (12. EQ. 0)12=11 
NEDGE (1, J) =11 
NEDGE (2, J) =12 
NWID(J) =12-11+1 
90 CONTINUE 

FIND THE WAVELENGTH EDGES AND SEND THEM TO THE FIRST OUTPUT FILE 


DO 100 J=l, NTERM 
WRITE (12, *) J 
IF (J.EQ. 1)NS1=NX (J) 

IF ( J.NE . 1) NS1=NX ( J) -NX(J-l) 

DO 100 1=1, NS1 

IF (J.EQ . 1 ) NS2=I 

IF ( J.NE . 1) NS2=I+NX ( J-l ) 

I1=NEDGE (1, NS2) 

I2=NEDGE ( 2 , NS2 ) 

XW1=W1 +FLOAT (11-1) *DW 
XW2=Wl+FLOAT (12) *DW 

WRITE (12,101) NS2 , I , NEDGE ( 1 , NS2 ) , NEDGE (2, NS2) , XW1 , XW2 , NWID (NS2 ) 

100 CONTINUE 

101 FORMAT (215, 2X, 13, IX, 1 , 13, 2X, ' ; ' , F5 . 2, IX, ' ,F5. 2, 15) 

PRINT* ,' TOTAL NUMBER OF BANDS IS = ', NVTOT 

RANK THE L.D. BAND ACCORDING TO THEIR WIDTHS IN DESCENDING ORDER 
AND SEND THE RESULTS TO THE FIRST OUTPUT FILE 

DO 110 1=1, NV 
110 T1 ( I ) =FLOAT (NWID ( I ) ) 

DO 120 1=1, NVTOT 

CALL VABMXF (T1 (1) ,NV,1, IMAX,BIG) 

NRANK (I) =IMAX 

WRITE (12, *) I, NRANK (I) , NEDGE (1, IMAX) , NEDGE (2, IMAX) ,NWID(IMAX) 


OLB00690 

OLB00700 

OLB00710 

OLB00720 

OLB00730 

OLB00740 

OLB00750 

OLB00760 

OLB00770 

OLB00780 

OLB00790 

OLB00800 

OLB00810 

OLB00820 

OLB00830 

OLB00840 

OLB00850 

OLB00860 

OLB00870 

OLB00880 

OLB00890 

OLB00900 

OLB00910 

OLB00920 

OLB00930 

OLB00940 

OLB00950 

OLB00960 

OLB00970 

OLB00980 

OLB00990 

OLB01000 

OLB01010 

OLB01020 

OLB01030 

OLB01040 

OLB01050 

OLB01060 

OLB01070 

OLB01080 

OLB01090 

OLBOllOO 

OLB01110 

OLB01120 

OLB01130 

OLB01140 

OLB01150 

OLB01160 

OLB01170 

OLB01180 

OLB01190 

OLB01200 

OLB01210 

OLB01220 

OLB01230 

OLB01240 

OLB01250 
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120 T1 (IMAX) =0 . 0 
C 

C CHECK IF THE L.D. BAND IS REPEATED. IF IT IS, SET NREP(I) = 0 
C 

DO 140 1=1 , NVTOT 
DO 130 J=l, NVTOT 
IF(I.EQ.J)GO TO 130 
Il=NRANK (I ) 

I2=NRANK ( J) 

I3=NWID (II) 

I4=NWID(I2) 

IF (13 ,NE. 14) GO TO 130 
I START=NEDGE (1,11) 

JSTART=NEDGE (1,12) 

IEND=NEDGE (2,11) 

JEND=NEDGE (2,12) 

IF (ISTART . EQ . JSTART . AND . IEND . EQ . JEND . AND . I . GT . J) NREP (I)=0 

130 CONTINUE 

IF (NREP ( I ) . EQ . 0 ) GO TO 140 
IX=NRANK ( I ) 

C 

C THE FO LL OWING WRITE STATEMENT CAN BE USED FOR INTERNAL CHECKING 
C 

C WRITE (12, 131) I, NREP (I) ,NRANK (I) ,NEDGE(1,IX) ,NEDGE (2, IX) ,NWID (IX) 

131 FORMAT (314, 5X, 14, ' -',I4,5X,I4) 

140 CONTINUE 

C 

C FIND TOTAL NUMBER OF NON-REPEATED L.D. BAND 

C 

NDIFF=0 

DO 150 1=1, NVTOT 
IF (NREP (I) . EQ. 1)NDIFF=NDIFF+1 
150 MREP ( I ) =NREP ( I ) 

PRINT*, 'TOTAL NUMBER OF NON-IDENTICAL BANDS IS =',NDIFF 
C 

C FIND L.I. BAND BY CHECKING THE MATRIX RANK 
C 

ILI=1 

JWID=1 

DO 300 J=l, NVTOT 

IF (NREP ( J) . EQ . 0 ) GO TO 300 

JR=NRANK ( J) 

DO 160 1=1, NP1 
160 TEST (I, ILI) =Z (I, JR) 

DO 170 KI=1, NP1 
DO 170 K J=1 , ILI 
170 A (KI, KJ) =TEST (KI, KJ) 

C 

C REDUCE THE MATRIX A TO ITS ECHELON FORM 
C 

CALL ECHEL (A,NP1,NVX,NP1, ILI) 

IEV=0 

DO 190 KI=1,NP1 
DO 180 KJ=1, ILI 
IF (A (KI, KJ) .NE . 0.0) IEV=IEV+1 
IF (A (KI , KJ) . NE .0.0) GO TO 190 
180 CONTINUE 
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190 CONTINUE 

SEND THE RANK INFORMATION TO THE FIRST OUTPUT FILE 

WHERE ' IEV' IS THE RANK AND 'ILI' IS TOTAL NUMBER OF BANDS TESTED 

WRITE (12, *) 'IEV=',IEV, ILI=' , ILI, ' AT J=',J 
IF (IEV. LT . ILI ) WRITE (12, *) ' IEV. LT. ILI AT J=',J 
IF ( IEV . LT . ILI ) GO TO 200 

IF RANK IS EQUAL TO TOTAL NO. OF BANDS, TEST THE NEXT WIDEST BAND 

IF (IEV. EQ. ILI) ILI=ILI+1 
GO TO 300 

IF RANK IS LESS THEN TOTAL NO. OF BANDS, 

ELIMINATE THE WIDEST L.D. BAND 

200 DO 250 JXLD=1 , ILI 
DO 210 K J=1 , ILI 
DO 210 KI=1, NP1 
210 A (KI, KJ) =TEST (KI, KJ) 

DO 220 KI=1, NP1 
220 A (KI, JXLD) =TEST (KI, ILI ) 

JLI=ILI-1 

CALL ECHEL (A,NP1, NVX,NP1, JLI) 

IEV=0 

DO 240 KI=1,NP1 
DO 230 K J=1 , JLI 
IF (A (KI , K J) . NE . 0 . 0 ) IEV=IEV+1 
IF (A (KI , K J) . NE .0.0) GO TO 240 
230 CONTINUE 
240 CONTINUE 

PRINT* , ' IEV= ' , IEV, ' ; ILI= ' , ILI , ' AT J= ' , J 
IF ( IEV. LT. ILI) PRINT*, 'IEV. LT. ILI AT J=’,J 
IF ( IEV . EQ . JLI ) J2LD= JXLD 
IF (IEV. EQ. JLI) GO TO 260 
250 CONTINUE 
260 11=0 
12=0 

DO 270 KI=1, NP1 
CK1=TEST (KI, J2LD) 

IF (CK1 . EQ . 0 . 0 ) GO TO 270 
IF (CK1 . NE . 0 . 0 . AND . I 1 . EQ . 0 ) I 1=KI 
IF (CK1 .NE . 0 . 0 . AND . II .NE . 0) I2=KI 
270 CONTINUE 

IF (12 . EQ . 0) 12=11 
DO 275 KI=1 , NVTOT 
IF (MREP (KI ) . EQ . 0 ) GO TO 275 
MAX=NRANK (KI ) 

MEDGE1=NEDGE (1,MAX) 

MEDGE2=NEDGE (2, MAX) 

IF (II . EQ . MEDGE1 . AND . 12 . EQ . MEDGE2) J1LD=KI 
IF (II .EQ.MEDGE1 . AND . 12 . EQ.MEDGE2) GO TO 280 
275 CONTINUE 
280 MREP ( J1LD) =0 

SEND THE POSITION OF THE WIDEST L.D. BAND FEATURE 


OLB01830 

OLB01840 

OLB01850 

OLB01860 

OLB01870 

OLB01880 

OLB01890 

OLB01900 

OLB01910 

OLB01920 

OLB01930 

QLB 01 940 

OLB01950 

OLB01960 

OLB01970 

OLB01980 

OLB01990 

OLB02000 

OLB02010 

OLB02020 

OLB02030 

OLB02040 

OLB02050 

OLB02060 

OLB02070 

OLB02080 

OLB02090 

OLB02100 

OLB02110 

OLB02120 

OLB02130 

OLB02140 

OLB02150 

OLB02160 

OLB02170 

OLB02180 

OLB02190 

OLB02200 

OLB02210 

OLB02220 

OLB02230 

OLB02240 

OLB02250 

OLB02260 

QLB02270 

OLB02280 

OLB02290 

OLB02300 

OLB02310 

OLB02320 

OLB02330 

OLB02340 

OLB02350 

OLB02360 

OLB02370 

OLB02380 

OLB02390 
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TO THE FIRST OUTPUT FILE WHERE : 

J1LD IS THE POSITION ON THE VARIABLES NREP AND MREP 
J2LD IS THE POSITION ON THE RANK CHECKING MATRIX 

WRITE (12, *) 'J =',J, JlLD = ' , J1LD, ' ; J2LD =',J2LD 
DO 290 J1=J2LD, ILI-1 
DO 290 11=1, NP1 
290 TEST (II, Jl) =TEST (II, Jl+1) 

300 CONTINUE 

SEND THE L.I. INDEX TO THE FIRST OUTPUT FILE 

PRINT*, 'TOTAL NUMBER OF L.I. BANDS IS =',IEV 
DO 310 1=1 , NVTOT 
310 WRITE (12,*) I, NREP (I) , MREP (I) 

NORMALIZE THE O.L. BANDS AND SEND THEM TO THE SECOND OUTPUT FILE 

DO 330 J=1 , IEV 
XN1=0 . 0 

DO 320 1=1, NP1 
IF (TEST (I, J) . EQ . 1 ) XN1=XN1 + 1 
320 CONTINUE 

DO 330 1=1, NP1 

330 TEST (I, J) =TEST (I, J) /SQRT (XN1) 

DO 340 J=l, IEV 
DO 340 K=l,NPl/5 
J1=IEV-J+1 

340 WRITE (13, 341) (TEST (I, J) , 1=1+ (K-l) *5, K*5) 

341 FORMAT ( 5E1 5.7) 

GO TO 360 

350 PRINT*, 'TOTAL NUMBER OF BANDS IS OUT OF PRESET LIMIT' 

360 STOP 
END 

SUBROUTINE ECHEL (A, NP1 , NVX, NROW, NCOL) 

REAL A (NP1,NVX) 

THIS SUBROUTINE REDUCES MATRIX A INTO ITS ECHELON FORM 

JCOL=l 
IROW=l 

5 DO 100 I=IROW, NROW 

IF(A(I, JCOL) .EQ.0.0)GO TO 100 
C INTERCHANGE I AND IROW TO GET NONZERO PIVOT 

IF ( I . EQ . IROW) GO TO 20 
DO 10 J= JCOL, NCOL 
XI =A (I, J) 

A (I, J) =A (IROW, J) 

10 A (IROW, J) =X1 

C NORMALIZE ROW TO GET POSITIVE NUMBER FOR PIVOT 
20 IF (A (IROW, JCOL) .GT.0.0)GO TO 40 
DO 30 J= JCOL, NCOL 
30 A (IROW, J) =-A (IROW, J) 

40 IF ( IROW. GE. NROW) RETURN 
C ZERO COLUMN BELOW PIVOT 

IROWX=IROW+l 


OLB02400 

OLB02410 

OLB02420 

OLB02430 

OLB02440 

OLB02450 

OLB02460 

OLB02470 

OLB02480 

OLB02490 

OLB02500 

OLB02510 

OLB02520 

OLB02530 

OLB02540 

OLB02550 

OLB02560 

OLB02570 

OLB02580 

OLB02590 

OLB02600 

OLB02610 

OLB02620 

OLB02630 

OLB02640 

OLB02650 

OLB02660 

OLB02670 

OLB02680 

OLB02690 

OLB02700 

OLB02710 

OLB02720 

OLB02730 

OLB02740 

OLB02750 

OLB02760 

OLB02770 

OLB02780 

OLB02790 

OLB02800 

OLB02810 

OLB02820 

OLB02830 

OLB02840 

OLB02850 

OLB02860 

OLB02870 

OLB02880 

OLB02890 

OLB02900 

OLB02910 

OLB02920 

OLB02930 

OLB02940 

OLB02950 

OLB02960 
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DO 60 K=I ROWX , NROW 
X1=A(K, JCOL) 

IF (XI . EQ . 0 . 0 ) GO TO 60 
DO 50 J=JCOL, NCOL 
50 A (K, J) =-Xl*A (IROW, J) +A (K, J) 
60 CONTINUE 
IROW=IROW+l 
JCOL= JCOL+ 1 
GO TO 5 
100 CONTINUE 

IF (IROW. GT . NROW) RETURN 

JCOL=JCOL+l 

GO TO 5 

END 


OLB02970 

OLB02980 

OLB02990 

OLB03000 

OLB03010 

OLB03020 

OLB03030 

OLB03040 

OLB03050 

OLB03060 

OLB03070 

OLB03080 

OLB03090 

OLB03100 


C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 


PROGRAM CLST 

PARAMETER (NTERM=1 6, MTERM=NTERM* (NTERM+1) /2, NCLS=3, NP1=100, 
+NSET=1 , MSET=1, NDSET=1 , NTSET=1 , NF2=10, NF3=5, NSMAX=1000, 

+NKLT=0 , IEV=0 , NLI=1 6 , VLD=-0 . 0 , NSAMP=1 0 , NF=NF2 ) 

NKLT = 1 : JUST FIND TRANSFORMED DATA XKLT 

NKLT = 0 : FIND XKLT AND CLASS STATISTICS 

NF = NF2 = 10 USED TO READ (10F8.3) RAW DATA 

NF = NF3 = 5 USED TO READ (5E15.7) CANONICAL TRANSFORMED DATA 
WHEN : NF = NF3 = 5 — > NP1 MUST BE REDUCED TO LOWER DIM. 


WHEN : 

NTERM 

NCLS 

NP1 


NLI 

NSMAX 

NSAMP 


TOTAL NUMBER OF FEATURES (MAY NOT ALL BE NUMERICALLY L . I . ) CLS00130 
TOTOAL NUMBER OF INFORMATION CLASSES CLS00140 

DIMENSIONALITY OF INPUT DATA CLS00150 

NP1 = RAW DATA DIMENSIONALTY IF USED IN DATA PREPROCESSINGCLS00160 
NP1 = TRANSFORMED DATA DIMENSIONALTY IF USED IN CAN. ANAL . CLS00170 
INPUT FEATURE READING INDEX, EITHER 1 OR 0 CLS00180 

IEV = 0 IF FEATURE FILE DOES NOT CONTAIN TRACE & EVALUES CLS00190 
IEV = 1 IF FEATURE FILE CONTAINS TRACE & EVALUES CLS00200 

TOTAL NUMBER OF L.I. FEATURES DESIRED CLS00210 

PRESET MAX. NO. OF SAMPLES FOR ONE CLASS CLS00220 

TOTAL NUMBER OF TEST SAMPLES USED TO CHECK POS. DEF . CLS00230 

CLS00240 

r*r cnno c: n 


REAL X (NSMAX, NTERM) , Z (NP1 , NTERM) , RX (NP1 ) , 

+T1 (NP1) ,T2 (NP1) , T3 (NP1) , XT (NP1) , XM (NP1) , D (NP1) , 
+XMCT (NTERM, NCLS) ,XMC (NTERM) ,W(NP1) ,T(NP1), 

+VCT (MTERM, NCLS) , VC (MTERM) , CT (NCLS) , 

+VCIT (MTERM, NCLS) ,VCI (MTERM) , TEST (NTERM, NTERM) , 
+VCIF (NTERM, NTERM) , VCF (NTERM, NTERM) , 

+VCTF (NTERM, NTERM, NCLS) , XMCTF (NTERM, NCLS) , 

+VCTLI (MTERM, NCLS) , XMTLI (NTERM, NCLS) , 

+WK (NTERM) ,VCV (MTERM) ,VEC (NSAMP, NTERM) 

INTEGER NBR(6) ,NST (NCLS,NTSET) 

DOUBLE PRECISION DSEED 

DATA (NBR(I) ,1=4, 6) ,W/1,0, 0,NP1*1.0/ 


CLS00260 

CLS00270 

CLS00280 

CLS00290 

CLSQQ300 

CLS00310 

CLS00320 

CLS00330 

CLS00340 

CLS00350 

CLS00360 

CLS00370 

CLS00380 
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c 

X 

= 

TRANSFORMED DATA 

c 

z 

= 

FEATURES 

c 

RX 

= 

TEMPORARY STORAGE FOR FEATURES 

c 

XT 


INPUT DATA 

c 

XM 

= 

MEAN VECTOR 

c 

D 


EIGENVALUES 

c 

XMCT 


MEAN VECTOR FOR ALL CLASSES 

c 

XMC 

- 

MEAN VECTOR FOR ONE CLASS 

c 

VCT 

= 

COVARIANCE MATRIX FOR ALL CLASSES 

c 

VC 

s= 

COVARIANCE MATRIX FOR ONE CLASS 

c 

VCIT 

= 

INVERSE MATRIX OF ALL CLASS COVARIANCE MATRICES 

c 

VCI 

= 

INVERSE MATRIX OF ONE CLASS COVARIANCE MATRIX 

c 

TEST 

= 

INTERNAL MATRIX INVERSION CHECKING MATRIX 

c 

VCTLI 


COV. MATRIX FOR ALL CLASSES BY USING ALL L.I. FEATURES 

c 

XMTLI 

= 

MEAN VECTOR FOR ALL CLASSES BY USING L.I. FEATURES 

c 

WK 

= 

WORKING SPACE FOR IMSL ROUTINES 

c 

VCV 

3= 

COV. MATRIX USED TO TEST ITS POSITIVE DEFINITENESS 

c 

VEC 

= 

GENERATED SAMPLES USED TO TEST POSITIVE DEFINITENESS 


c 

c »CHOOSE OR TYPE IN THE CORRECT NUMBERS OF SAMPLES IN THE DATA SETS 

C 

C 

C NSET FI 
C 1 M2611KI 

C 2 M2611K2 

C 3 M2611K3 

C 4 M2614N1 

C 5 M2614N2 

C 6 M2614N3 

C 

C DATA NST/141, 414, 277, 658, 211, 682, 677, 643, 157/ 

C DATA NST/141, 414, 277, 658, 211, 682, 677, 643, 157, 

C +664, 437, 164,787, 291, 161,931, 330, 183/ 

C DATA NST/664, 437, 164, 787,291, 161, 931, 330, 183/ 

C DATA NST/141, 414, 277, 658, 211, 682, 677/ 

C DATA NST/587, 216,121/ 

DATA NST/658,211, 682/ 


NP2 

A 

] 

832 

WW: 141 

SF 

1551 

WW: 658 

SF 

1477 

WW: 677 

SF 

1265 

SW: 664 

SF 

1239 

SW: 787 

SF 

1444 

SW: 931 

SF 


C DACO 
414 GS : 277 760928 
211 UC: 682 770503 
643 GS : 157 770626 
437 NP : 164 770508 
291 NP: 161 770629 
330 NP : 183 770804 


EXNU RUSE 
76102207 1-1622 

77102207 6515-8096 
77102207 8097-9691 
77102217 1-1396 

77102217 2777-4141 
77102217 5426-6993 


C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 


THE FOLLOWING DATA 'NST' ARE USED FOR SOIL ORDER DATA SET. 'SO' 

NP2=479; MOL ALF EN AR UL IN SP VE H OX UNCLASSIFIED 
DATA NST/154, 113, 78, 52, 45, 37, 30, 11, 8, 11, 32/ 

DATA NST/154, 113, 78, 52, 45, 97/ 

DATA NST/154, 113, 78, 52, 45, 37/ 


THE FOLLOWING DATA 'NST' IS USED FOR SOIL 'OM1' DATA SET 
I.E. (1) MOLLI SOL , OR (2) ALFISOL, AND GROUP SAMPLES 
ACCORDING TO THEIR ORGANIC MATERIAL: % WEIGHT 
CLASS 1 TO 6 : NP2 = 255 


CLSl 


.11% 

.GE. 

OM 

.LE. 

1.5% 

# 

1 • 

-> # 51 

CLS2 


1.5% 

.GT. OM 

.LE. 

2.0% 

# 

52 

-> # 104 

CLS3 


2.0% 

.GT. 

OM 

.LE. 

2.5% 

# 

105 

-> # 138 

CLS4 


2.5% 

,GT. 

OM 

.LE. 

3.5% 

# 

139 

-> # 183 

CLS5 


3.5% 

.GT. 

OM 

.LE. 

5.0% 

# 

184 

-> # 222 


CLS00390 

CLS00400 

CLS00410 

CLS00420 

CLS00430 

CLS00440 

CLS00450 

CLS00460 

CLS00470 

CLS00480 

CLS00490 

CLS00500 

CLS00510 

CLS00520 

CLS00530 

CLS00540 

CLS00550 

CLS00560 

CLS00570 

CLS00580 

CLS00590 

-CLS00600 

CLS00610 

CLS00620 

CLS00630 

CLS00640 

CLS00650 

CLS00660 

CLS00670 

CLS00680 

CLS00690 

CLS00700 

CLS00710 

CLS00720 

CLS00730 

CLS00740 

CLS00750 

CLS00760 

■CLS00770 

CLS00780 

CLS00790 

CLS00800 

CLS00810 

CLS00820 

CLS00830 

CLS00840 

-CLS00850 

CLS00860 

CLS00870 

CLS00880 

CLS00890 

CLS00900 

CLS00910 

CLS00920 

CLS00930 

CLS00940 

CLS00950 
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C CLS6 : 5.0% .GT. OM .LE. 10.12% : # 223 -> # 255 CLS00960 
c CLS00970 

C DATA NST/51, 54, 33, 45, 39, 33/ CLS00980 
c CLS00990 

C DATA 'S2A' : ANOTHER TEST GROUPED BY THE SAME OM RANGES AS 'OM2 ' CLS01000 
C CM PERCENTAGE : 0,1; 1,2; 2,3; 3,4; 4,6; 6 AND ABOVE CLS01010 
c CLS01020 

C DATA NST/26, 78, 64, 32, 55/ CLS01030 
c CLS01040 
c CLS01050 

c CLS01060 

C THE FOLLOWING DATA 1 NST' IS USED FOR 'OM2* DATA SET CLS01070 
C ACCORDING TO THEIR ORGANIC MATERIAL: % WEIGHT CLS01080 
C CLASS 1 TO 6 : NP2 =514 CLS01090 
C CLS1 : .08% .GE. OM .LE. 1.0% : # 1 -> # 82 CLS01100 
C CLS2 : 1.0% .GT. OM .LE. 2.0% : # 83 -> # 217 CLS01110 
C CLS3 : 2.0% .GT. OM .LE. 3.0% : # 218 -> # 337 CLS01120 
C CLS4 : 3.0% .GT. OM .LE. 4.0% : # 338 -> # 391 CLS01130 
C CLS5 : 4.0% .GT. OM .LE. 6.0% : # 392 -> # 450 CLS01140 
C CLS6 : 6.0% .GT. OM .LE. 84.79% : # 451 -> # 514 CLS01150 
c CLS01160 

C DATA NST/82, 135, 120, 54, 59, 64/ CLS01170 
c CLS01180 

C DATA NST/82, 135, 120, 54, 123/ CLS01190 
C DATA NST/44, 31, 18, 23, 24, 51, 37, 27/ CLS01200 
C DATA NST/83, 57, 94, 31, 37, 59, 103, 26, 24/ CLS01210 
C DATA NST/103, 26, 24/ CLS01220 
c — CLS01230 

C THE FOLLOWING DATA ’NST' IS USED FOR SOIL IRON OXIDE '10' DATA SETCLS01240 
C ACCORDING TO THEIR FE203 % WEIGHT CLS01250 
C CLASS 1 TO 6 : NP2 =467 CLS01260 
C CLS1 : .02% .GE. FE203 .LE. 0.4% : # 1 -> # 102 CLS01270 
C CLS2 : 0.4% .GT. FE203 .LE. 0.6% : # 103 -> # 175 CLS01280 
C CLS3 : 0.6% .GT. FE203 .LE. 0.8% : # 176 -> # 244 CLS01290 
C CLS4 : 0.8% .GT. FE203 .LE. 1.2% : # 245 -> # 349 CLS01300 
C CLS5 : 1.2% .GT. FE203 .LE. 1.6% : # 350 -> # 401 CLS01310 
C CLS6 : 1.6% .GT. FE203 .LE. 25.60% : # 402 -> # 467 CLS01320 
c CLS01330 

C DATA NST/102, 73, 69, 105, 52, 66/ CLS01340 
c CLS01350 

c CLS01360 

C THE FOLLOWING DATA 'NST' IS USED FOR SOIL TEXTURE 'ST' DATA SET CLS01370 
C ACCORDING TO THEIR SAND-SILT-CLAY % CONTENT CLS01380 
C CLASS 1 TO 6 : NP2 = 483; DETAILS : SEE FILE ( S5L.DATA.C1) CLS01390 
c CLS01400 

C DATA NST/40, 63, 76, 93, 68, 143/ CLS01410 
c CLS01420 
c CLS01430 

C THE FOLLOWING DATA 'NST' IS USED FOR S.D. VEGETATION DATA CLSQ1440 
c CLS01450 

C DATA NST/225, 61,292,469, 82,182,63,103, 39,39,217,51, CLS01460 
C +393,441,80,88, 88,41,32,26, 118,43,121,44, 45,102,66,89, CLS01470 
C +78,53,147,39, 24,42,119,69, 76,96,107,154, 28,19/ CLS01480 
c CLS01490 
c — CLS01500 

C THE FOLLOWING DATA 'NST' IS USED FOR IOWA VEGETATION DATA CLS01510 
r CLS01520 
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C DATA NST/514,41, 517,36,32, 621,517,45, 610,485,21, 

C +437,377,22, 190,172,25, 650,568,42, 435,417,44, 393,267/ 
C 

C 

C 

C 11 = DATA; 12 = FEATURES; 13 = CLASS STATISTICS; 

C 14 = TRANSFORMED DATA ; 15 = LDBAND ; 16 = RANDOM 

C 

OPEN (11) 

OPEN (12) 

OPEN (13) 

OPEN (14) 

OPEN (15) 

REWIND 11 
REWIND 12 
REWIND 13 
REWIND 14 
REWIND 15 
C 

C SET UP DATA INPUT&OUTPUT DO LOOP PARAMETERS 

C 

IKl=MOD (NCLS, 6) 

IMl=6* (NCLS/ 6) +1 
ILPl=NCLS/6 
IF (ILP1 .EQ. 0) ILP1=1 
IK2=MOD (NTERM, 5) 

IM2=5* (NTERM/5) +1 
ILP2=NTERM/5 
IF (ILP2.EQ. 0) ILP2=1 
DO 650 ISET=NSET, MSET, NDSET 
C 

C READ FEATURE FILE IN TWO CASES ( IEV = 0 OR 1 ) 

C 

IF (IEV . EQ . 0 ) GO TO 10 
READ ( 1 2 ,*) TRACE , SUM 
CALL SRI (12,NP1,NF3,D) 

10 DO 30 JTERM=1, NTERM 
CALL SRI (12,NP1,NF3,RX) 

DO 20 1=1, NP1 
20 Z (I, JTERM) =RX (I) 

30 CONTINUE 
C 

C FIND MEAN VECTOR AND COVARIANCE MATRIX FOR EACH CLASS 
C IN THE FEATURE TRANSFORMED DATA 
C 

DO 150 LTERM=NTERM, NTERM 
KTERM=LTERM* (LTERM+1) /2 
DO 150 ICLS=1,NCLS 
NS=NST (ICLS, I SET) 

PRINT*, ' I SET = ', ISET, ' ; ' , LTERM, ICLS, NS 
DO 40 1=1, NSMAX 
DO 40 J=l, NTERM 
40 X(I,J)=0.0 

DO 100 ISAMP=1,NS 
CALL SRI (11,NP1,NF,XT) 

DO 70 JTERM=1 , LTERM 
DO 60 1=1, NP1 


CLS01530 

CLS01540 

CLS01550 

CLS01560 

CLS01570 

CLS01580 

CLS01590 

CLS01600 

CLS01610 

CLS01620 

CLS01630 

CLS01640 

CLS01650 

CLS01660 

CLS01670 

CLS01680 

CLS01690 

CLS01700 

CLS01710 

CLS01720 

CLS01730 

CLS01740 

CLS01750 

CLS01760 

CLS01770 

CLS01780 

CLS01790 

CLS01800 

CLS01810 

CLS01820 

CLS01830 

CLS01840 

CLS01850 

CLS01860 

CLS01870 

CLS01880 

CLS01890 

CLS01900 

CLS01910 

CLS01920 

CLS01930 

CLS01940 

CLS01950 

CLS01960 

CLS01970 

CLS01980 

CLS01990 

CLS02000 

CLS02010 

CLS02020 

CLS02030 

CLS02040 

CLS02050 

CLS02060 

CLS02070 

CLS02080 

CLS02090 
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T1 (I)=XT (I) 

T2 (I) =W (I) *T1 (I) 

60 T3(I)=Z (I, JTERM) 

CALL VIPRFF (T3,T2,NP1, 1, 1,XIP) 

70 X (ISAMP, JTERM) =XIP 
C 

C SEND THE RESULTS TO THE TRANSFORMED DATA FILE 
C 

IF (NTERM.LT. 5) GO TO 90 
DO 80 11=1, ILP2 

80 WRITE (14, 91) (X (ISAMP, J1 ) , Jl=l+ (11-1 ) *5, 11*5) 

IF (IK2 .EQ. 0) GO TO 100 

90 WRITE (14, 91) (X (ISAMP, Jl) , J1=IM2, NTERM) 

91 FORMAT ( 5E15 . 7 ) 

100 CONTINUE 

C 

C FIND THE CLASS STATISTICS IF NKLT = 0 
C 

IF (NKLT . EQ . 1 ) GO TO 150 
NBR(l) =LTERM 
NBR(2) =NS 
NBR(3)=NS 
DO 110 1=1, NP1 
110 T (I) =0 . 0 

CALL BECOVM (X, NSMAX, NBR, T, XMC, VC, IER) 

C 

C STORE THE CLASS STATISTICS FOR POSITIVE DEFINITENESS CHECKING 
C 

DO 120 I=1,LTERM 
C WRITE ( * , * ) LTERM, I , XMC ( I ) 

120 XMCT { I , ICLS ) =XMC ( I ) 

DO 130 1=1 , KTERM 
C WRITE (*, *) LTERM, I, VC (I) 

130 VCT (I, ICLS) =VC (I) 

PRINT*, ' THE IER MUST BE "0" FOR BECOVM ' 

PRINT*, IER 
150 CONTINUE 
C 

C STOP THE PROGRAM IF ONLY WANT TO FIND TRANSFORMED DATA (NKLT=1 ) 

C 

IF (NKLT . EQ . 1 ) GO TO 650 
C 

C STORE THE CLASS STATISTICS INTO FULL STORAGE MODE FOR CHECKING 
C 

DO 170 ICLS=1, NCLS 

DO 170 1=1, NTERM 

DO 160 J=1,I 

IND=I* (1-1) /2+J 

VCTF (I, J, ICLS) =VCT (IND, ICLS) 

VCTF ( J, I, ICLS) =VCTF (I, J, ICLS) 

C WRITE (*, *) I, J, IND, VCTF (I, J, ICLS) , VCTF (J, I, ICLS) 

160 CONTINUE 

XMCTF ( I , ICLS ) =XMCT ( I , ICLS ) 

170 CONTINUE 
C 

C START CHECKING THE POSITIVE DEFINITENESS OF THE COV. MATRICES 
C IF ' LTERM 'TH FEATURE IS L.D. ON THE OTHER FEATURES, THE RELATED 


CLS02100 

CLS02110 

CLS02120 

CLS02130 

CLS02140 

CLS02150 

CLS02160 

CLS02170 

CLS02180 

CLS02190 

CLS02200 

CLS02210 

CLS02220 

CLS02230 

CLS02240 

CLS02250 

CLS02260 

CLS02270 

CLS02280 

CLS02290 

CLS02300 

CLS02310 

CLS02320 

CLS02330 

CLS02340 

CLS02350 

CLS02360 

CLS02370 

CLS02380 

CLS02390 

CLS02400 

CLS02410 

CLS02420 

CLS02430 

CLS02440 

CLS02450 

CLS02460 

CLS02470 

CLS02480 

CLS02490 

CLS02500 

CLS02510 

CLS02520 

CLS02530 

CLS02540 

CLS02550 

CLS02560 

CLS02570 

CLS02580 

CLS02590 

CLS02600 

CLS02610 

C^S02620 

CLS02630 

CLS02640 

CLS02650 

CLS02660 
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C ELEM ENTS IN THE MEAN VECTORS AND COVARIANCES WILL BE REMOVED 

C 

ILI=1 

JLI=ILI*(ILI+l)/2 
DO 600 LTERM=1 , NTERM 
KTERM=LTERM* (LTERM+1) / 2 
DO 400 ICLS=1,NCLS 
IX=0 

DO 200 IROW=l , LTERM 
V1=0 . 0 

DO 180 JCK=1, LTERM 
180 Vl=Vl+VCTF (IROW, JCK, ICLS) 

VCK=VLD* LTERM 

IF (Vl . EQ . VCK) GO TO 200 

IX=IX+1 

IY=0 

DO 190 JCOL=l , LTEPM 
V2=VCTF (IROW, JCOL, ICLS) 

IF ( V2 . EQ . VLD ) GO TO 190 
IY=IY+1 
VCF (IX, IY) =V2 
190 CONTINUE 
200 CONTINUE 

C WRITE (15, *) IX, IY, ILI 

C PRINT*, 'IX, IY, ILI MUST BE THE SAME ' , IX, IY, ILI 

CALL VCVTFS (VCF, ILI, NTERM, VC) 

C WRITE (*,*) ICLS, VC (1) 

OPEN (16) 

REWIND 16 
DO 210 1=1, JLI 
WRITE (16, 211) VC (I) 

C VCV(I)=VC(I) 

210 VCTLI (I, ICLS) =VC (I) 

211 FORMAT (El 3. 5) 

OPEN (16) 

REWIND 16 

DO 220 1=1, JLI 
220 READ (16, 211) VCV (I) 

DO 230 1=1, NTERM 
230 WK (I) =0 . 0 
DSEED=5 , DO 
C 

C SECOND TEST ON NUMERICAL POSITIVE DEFINITENESS OF THE MATRICES 

C 

CALL GGNSM (DSEED , NSAMP , ILI , VCV, NS AMP , VEC , WK, IER) 

IF (IER.NE. 0) GO TO 440 
C WRITE (*,*) ICLS, VCTLI (1, ICLS) ,VC(1) 

C 

c CHECK IF ALL CLASS COVARIANCES HAVE INVERSE MATRICES 
C VC WILL BE CHANGED AFTER LINV1P 

C 

CALL LINV1P (VC, ILI, VCI , IDGT, Dl, D2, IER) 

C WRITE (*,*) ICLS, VCI (1) 

C PRINT*,’ THE FOLLOWING IER MUST BE 0 FOR LINV1P' 

C PRINT*, ISET, LTERM, ICLS, ' ; IER=',IER 

IF (IER.NE. 0) GO TO 450 
DO 240 1=1, JLI 


CLS02670 

CLS02680 

CLS02690 

CLS02700 

CLS02710 

CLS02720 

CLS02730 

CLS02740 

CLS02750 

CLS02760 

CLS02770 

CLS02780 

CLS02790 

CLS02800 

CLS02810 

CLS02820 

CLS02830 

CLS02840 

CLS02850 

CLS02860 

CLS02870 

CLS02880 

CLS02890 

CLS02900 

CLS02910 

CLS02920 

CLS02930 

CLS02940 

CLS02950 

CLS02960 

CLS02970 

CLS02980 

CLS02990 

CLS03000 

CLS03010 

CLS03020 

CLS03030 

CLS03040 

CLS03050 

CLS03060 

CLS03070 

CLS03080 

CLS03090 

CLS03100 

CLS03110 

CLS03120 

CLS03130 

CLS03140 

CLS03150 

CLS03160 

CLS03170 

CLS03180 

CLS03190 

CLS03200 

CLS03210 

CLS03220 

CLS03230 



ooo non non 


125 


240 VCIT (I, ICLS) =VCI (I) 

STORE BACK THE VALUES OF VCVC FROM VCVCF 

CALL VCVTFS (VCF, ILI, NTERM, VC) 

CALL VCVTSF(VCI, ILI, VCIF, NTERM) 

DET=Dl*2 . **D2 

CX= (2. *3. 14159)** (FLOAT (ILI)/ 2. ) 

C=1 . / (CX*SQRT (DET) ) 

CT (ICLS) =C 

IF ( ICLS . NE . NCLS ) GO TO 400 

SEND THE FINAL RESULTS TO THE CLASS STATISTICS FILE 

DO 250 KCLS=1 , NCLS 
IX=0 

DO 250 1=1, LTERM 
V3=XMCTF ( I , KCLS ) 

IF (V3 . EQ . VLD ) GO TO 250 
IX=IX+1 

XMTLI (IX, KCLS) =V3 
250 CONTINUE 

DO 280 1=1, ILI. 

IF (NCLS . LT . 6 ) GO TO 270 
DO 260 IL=1, ILP1 

260 WRITE (13, 321) (XMTLI (I,LCLS) ,LCLS=1+ (IL-1)*6,IL*6) 

IF ( IK1 . EQ . 0 ) GO TO 280 

270 WRITE (13, 321) (XMTLI (I, LCLS) , LCLS=IM1, NCLS) 

280 CONTINUE 

DO 310 1=1, JLI 
IF (NCLS. LT. 6) GO TO 300 
DO 290 IL=1, ILP1 

290 WRITE (13, 321) (VCTLI (I, LCLS) , LCLS=1+ (IL-1) *6, IL*6) 

IF (IK1 . EQ. 0) GO TO 310 

300 WRITE (13, 321) (VCTLI (I, LCLS) , LCLS=IM1, NCLS) 

310 CONTINUE 

IF (NCLS . LT . 6 ) GO TO 330 
DO 320 IL=1, ILP1 

320 WRITE (13, 321) (CT(LCLS) ,LCLS=1+ (IL-1) *6, IL*6) 

321 FORMAT (6E13. 5) 

IF (IK1 .EQ. 0) GO TO 340 
330 WRITE (13, 321) (CT (LCLS) , LCLS=IM1, NCLS) 

340 DO 370 1=1, JLI 

IF (NCLS . LT . 6 ) GO TO 360 
DO 350 IL=1, ILP1 

350 WRITE (13, 321) (VCIT (I, LCLS) , LCLS=1+ (IL-1) *6, IL*6) 

IF (IK1 . EQ. 0) GO TO 370 

360 WRITE (13, 321) (VCIT (I, LCLS) , LCLS=IM1, NCLS) 

370 CONTINUE 
400 CONTINUE 

INTERNAL CHECKING FOR ACCURACY OF MATRIX INVERSION 

DO 430 ICLS=1,NCLS 
DO 410 1=1, JLI 
VC (I) =VCTLI (I, ICLS) 

410 VCI (I) =VCIT (I, ICLS) 
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DO 420 1=1, JLI 

420 WRITE (*, *) VC (I) , VCI (I) 

CALL VMULSS (VC, VCI, ILI , TEST, NTERM) 

THE FOLLOWING 3 STATEMENTS CAN BE USED FOR MATRIX INVERSION CHECK 

PRINT*, ' THE FOLLOWING MATRIX MUST BE AN IDENTITY MATRIX ' 

DO 430 1=1, ILI 

WRITE (*,421) (TEST (I, J) , J=1,ILI) 

421 FORMAT (16F5. 2) 

430 CONTINUE 

PRINT*, * ILI =' , ILI 
ILI=ILI+1 
JLI=ILI* (ILI+1) / 2 
IF (ILI . GT . NLI ) GO TO 650 
GO TO 600 

SEND THE INFORMATION OF L.D. FEATURES & REASONS FOR 
NON-POSITIVE-DEFINITENESS OF COV. MATRICES TO THE FILE 1 LDBAND 1 

440 WRITE (15, *) 'GGNSM HAS IER.NE.0' 

450 WRITE ( 15 , *)' ISET =' , ISET, ' ; LTERM = ' , LTERM, ' ; ICLS =',ICLS 
PRINT*, 1 1 SET =', ISET, LTERM =', LTERM, ICLS =',ICLS 
DO 500 JCLS=1, NCLS 

THE FOLLOWING 5 STATEMENTS ARE USED FOR INTERNAL CHECKING 

WRITE (15,*) 'JCLS = ' , JCLS 
DO 460 1=1, NTERM 
460 WRITE (15, *) I, XMCTF (I, JCLS) 

DO 470 1=1, NTERM 

470 WRITE (15, 471) I, (VCTF (I, J, JCLS) , J=l, NTERM) 

471 FORMAT (14, 8F9 .2) 

RESET THE VARIABLES TO '0.0' FOR FUTURE USE 


XMCTF (LTERM, JCLS) =VLD 
DO 480 1=1, NTERM 
480 VCTF ( I , LTERM, JCLS ) =VLD 
DO 490 J=l, NTERM 
490 VCTF (LTERM, J, JCLS ) =VLD 
500 CONTINUE 

THE FOLLOWING 7 STATEMENTS ARE USED FOR INTERNAL CHECKING 
C 

C DO 550 JCLS=1, NCLS 

C WRITE (15,*) 'JCLS =', JCLS 

C DO 530 1=1, NTERM 

C 530 WRITE (15,*) I, XMCTF (I, JCLS) 

C DO 540 1=1, NTERM 

C 540 WRITE (15, 421) I, (VCTF (I, J, JCLS), J=l, NTERM) 

C 550 CONTINUE 
600 CONTINUE 
650 CONTINUE 
STOP 
END 

SUBROUTINE SRI (IFILE, NP1 , NFX, RX) 


CLS04170 

CLS04180 

CLS04190 

CLS04200 

CLS04210 

CLS04220 

CLS04230 

CLS04240 

CLS04250 

CLS04260 

CLS04270 

CLS04280 

CLS04290 

CLS04300 

CLS04310 

CLS04320 

CLS04330 

CLS04340 

CLS04350 

CLS04360 

CLS04370 
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THIS SUBROUTINE IS USED TO READ THE INPUT DATA 

CLS04380 

CLS04390 


REAL RX(NPl) 

CLS04400 

CLS04410 


IKX=MOD (NP1, NFX) 

CLS04420 


IMX=NFX* (NP1/NFX) +1 

CLS04430 


ILPX=NP1/NFX 

CLS04440 


IF (ILPX . EQ . 0 ) ILPR1=1 

CLS04450 


IF (NP1 . LT . NFX) GO TO 20 

CLS04460 


DO 10 1=1, ILPX 

CLS04470 

10 

READ (IFILE, *) (RX ( J) , J=l+ (1-1) *NFX, I*NFX) 

CLS04480 


IF (IKX.EQ. 0) GO TO 30 

CLS04490 

20 

READ (IFILE,*) (RX ( J) , J=IMX, NP1) 

CLS04500 

30 

RETURN 

CLS04510 


END 

CLS04520 


c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c- 

c 

c 

c 

c 

c 

c 

c 


PROGRAM CANONIC 

PARAMETER (NTERM=1 8 , MTERM=NTERM* (NTERM+1 ) /2, NCLS=3, 
+NWK=NTERM* (NTERM+2) ) 

REAL XMT (NTERM, NCLS ) , VCVT (MTERM, NCLS ) ,CT(NCLS) , 

+VCVIT (MTERM, NCLS) , D (NTERM) , Z (NTERM, NTERM) ,WK(NWK) , 

+WCS (MTERM) , ACS (MTERM) , WCS1 (MTERM) , TEST (NTERM, NTERM) , 

+T (NTERM, NTERM) , WCSI (MTERM) , XMO (NTERM) 

INTEGER NST(NCLS) 

DATA I JOB, IFLAG1 , 1 OPT, NIN, NOUT/2, 0, 3, 0, 6/ 

NTERM = DIMENSIONALITY IN THE CLASS STATISTICS 
NCLS = TOTAL NUMBER OF INFORMATION CLASSES 

XMT = MEAN VECTORS FOR ALL CLASSES 

VCVT = COV. MATRICES FOR ALL CLASSES 

CT = VARIABLE USED TO STORE M.L. THRESHOLD PARAMETER 

VCVIT = INVERSE COV. MATRICES FOR ALL CLASSES 

D = EIGENVALUES 

Z = CANONICAL FEATURES 

WK = WORKING SPACE FOR IMSL ROUTINES 

WCS = WITHIN CLASS SCATTER MATRIX 

ACS = AMONG CLASS SCATTER MATRIX 

WCSI = TEMPORARY STORAGE FOR WCS 

WCSI = INVERSE MATRIX OF WCS 

XMO = GLOBAL MEAN VECTOR 

TEST = INTERNAL CHECKING FOR MATRIX INVERSION ACCURACY 


CAN00010 
CAN00020 
CAN00030 
CAN00040 
CANO 00 50 
CAN00060 
CAN00070 
CAN00080 
CAN00090 
CAN00100 
CAN00110 
CAN00120 
CAN00130 
CAN00140 
CAN00150 
CAN00160 
CAN00170 
CAN00180 
CAN00190 
CAN00200 
CAN00210 
CAN00220 
CAN00230 
CAN00240 
CAN00250 
CAN00260 
CAN00270 
CAN00280 
CAN00290 


— »CHOOSE OR TYPE IN THE CORRECT NUMBERS OF SAMPLES IN THE DATA SETS CAN00300 


CAN00310 

■CAN00320 


NSET 

FI 

NP2 

A 

B 

C 

DACO 

EXNU 

RUSE 

CAJJ00330 

1 

M2611K1 

832 

WW: 141 

SF : 414 

GS : 277 

760928 

76102207 

1-1622 

CAN00340 

2 

M2611K2 

1551 

WW: 658 

SF:211 

UC : 682 

770503 

77102207 

6515-8096 

CAN00350 

3 

M2611K3 

1477 

WW: 677 

SF : 643 

GS : 157 

770626 

77102207 

8097-9691 

CAN00360 

4 

M2614N1 

1265 

SW: 664 

SF : 437 

NP : 164 

770508 

77102217 

1-1396 

CAN00370 
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5 M2614N2 1239 SW:787 SF:291 NP:161 770629 77102217 2777-4141 CAN00380 

6 M2614N3 1444 SW: 931 SF:330 NP:183 770804 77102217 5426-6993 CAN00390 


c CAN00400 

C DATA NST/141, 414, 277, 658, 211, 682, 677, 643, 157/ CAN00410 

C DATA NST/141, 414, 277, 658, 211, 682, 677, 643, 157, CAN00420 

C +664,437,164,787,291,161,931,330,183/ CAN00430 

C DATA NST/664, 437, 164, 787, 291, 161, 931, 330, 183/ CAN00440 

C DATA NST/141, 414, 277, 658, 211, 682, 677/ CAN00450 

C DATA NST/587, 216,121/ CAN00460 

DATA NST/658, 211, 682/ CAN00470 

c CAN00480 

P CAN00490 


THE FOLLOWING DATA 'NST' ARE USED FOR SOIL ORDER DATA SET. 'SO' CAN00500 


c CANO 0510 
C NP2=479; MOL ALF EN AR UL IN SP VE H OX UNCLASSIFIED CAN00520 
C DATA NST/154, 113, 78, 52, 45, 37, 30, 11, 8, 11, 32/ CAN00530 
C DATA NST/154, 113, 78, 52, 45, 97/ CAN00540 
C DATA NST/154, 113, 78, 52, 45, 37/ CAN00550 
c CAN00560 

c CANO 0570 

c CAN00580 
C THE FOLLOWING DATA 'NST' IS USED FOR SOIL 'OMl' DATA SET CAN00590 
C I.E. (1) MOLLISOL, OR (2) ALFISOL, AND GROUP SAMPLES CAN00600 
C ACCORDING TO THEIR ORGANIC MATERIAL: % WEIGHT CAN00610 
C CLASS 1 TO 6 : NP2 = 255 CAN00620 
C CLSl : .11% ,GE . OM .LE,. 1.5% : # 1 -> # 51 CAN00630 
C CLS2 : 1.5% .GT. OM .LE. 2.0% : # 52 -> # 104 CAN00640 
C CLS3 : 2.0% .GT. OM .LE. 2.5% : # 105 -> # 138 CAN00650 
C CLS4 : 2.5% .GT. OM .LE. 3.5% : # 139 -> # 183 CAN00660 
C CLS5 : 3.5% .GT. OM .LE. 5.0% : # 184 -> # 222 CAN00670 
C CLS6 : 5.0% .GT. OM .LE. 10.12% : # 223 -> # 255 CAN00680 
c CAN00690 

C DATA NST/51, 54, 33, 45, 39, 33/ CAN00700 
c CAN00710 

C DATA ' S2A' : ANOTHER TEST GROUPED BY THE SAME OM RANGES AS 'OM2' CANO 0720 
C OM PERCENTAGE : 0,1; 1,2; 2,3; 3,4; 4,6; 6 AND ABOVE CAN00730 
c CAN00740 

C DATA NST/26, 78, 64, 32, 55/ CAN00750 
c CAN00760 
c CAN00770 

c CAN00780 

C THE FOLLOWING DATA 'NST' IS USED FOR 'OM2' DATA SET CAN00790 
C ACCORDING TO THEIR ORGANIC MATERIAL: % WEIGHT CAN00800 
C CLASS 1 TO 6 : NP2 = 514 CAN00810 
C CLSl : .08% .GE. OM .LE. 1.0% : # 1 -> # 82 CAN00820 
C CLS2 : 1.0% .GT. OM .LE. 2.0% : # 83 -> # 217 CAN00830 
C CLS3 : 2.0% .GT. OM .LE. 3.0% : # 218 -> # 337 CAN00840 
C CLS4 : 3.0% .GT. OM .LE. 4.0% : # 338 -> # 391 CAN00850 
C CLS5 : 4.0% .GT. OM .LE. 6.0% : # 392 -> # 450 CAN00860 
C CLS6 : 6.0% .GT. OM .LE. 84.79% : # 451 -> # 514 CAN00870 
c CAN00880 

C DATA NST/82, 135, 120, 54, 59, 64/ CAN00890 
c CAN00900 

C DATA NST/82, 135, 120, 54, 123/ CAN00910 
C DATA NST/44, 31, 18, 23, 24, 51, 37, 27/ CAN00920 
C DATA NST/83, 57, 94, 31, 37, 59, 103, 26, 24/ CAN00930 
C DATA NST/103, 26, 24/ CAN00940 
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C CAN00950 

C THE FOLLOWING DATA 'NST' IS USED FOR SOIL IRON OXIDE 'IO' DATA SETCAN00960 
C ACCORDING TO THEIR FE203 % WEIGHT CAN00970 

C CLASS 1 TO 6 : NP2 =467 CAN00980 

C CLS1 : .02% .GE. FE203 .LE. 0.4% : # 1 -> # 102 CAN00990 

C CLS2 : 0.4% .GT. FE203 .LE. 0.6% : # 103 -> # 175 CAN01000 

C CLS3 : 0.6% .GT. FE203 .LE. 0.8% : # 176 -> # 244 CAN01010 

C CLS4 : 0.8% .GT. FE203 .LE. 1.2% : # 245 -> # 349 CAN01020 

C CLS5 : 1.2% .GT. FE203 .LE. 1.6% : # 350 -> # 401 CAN01030 

C CLS6 : 1.6% .GT. FE203 .LE. 25.60% : # 402 -> # 467 CAN01040 

c CAN01050 

C DATA NST/102, 73, 69, 105, 52, 66/ CAN01060 

c CAN01070 

c CAN01Q80 

C THE FOLLOWING DATA 'NST' IS USED FOR SOIL TEXTURE 'ST' DATA SET CAN01090 

C ACCORDING TO THEIR SAND-SILT-CLAY % CONTENT CAN01100 

C CLASS 1 TO 6 : NP2 = 483; DETAILS : SEE FILE ( S5L.DATA.C1) CAN01110 

c CAN01120 

C DATA NST/40, 63, 76, 93, 68, 143/ CAN01130 

c CAN01140 

c ______ CANO 11 50 

C THE FOLLOWING DATA 'NST' IS USED FOR S.D. VEGETATION DATA CAN01160 

c CAN01170 

C DATA NST/225, 61, 292, 469, 82,182,63,103, 39,39,217,51, CAN01180 

C +393,441,80,88, 88,41,32,26, 118,43,121,44, 45,102,66,89, CAN01190 

C +78,53,147,39, 24,42,119,69, 76,96,107,154, 28,19/ CAN01200 

c CAN01210 

c — CAN01220 

C THE FOLLOWING DATA 'NST' IS USED FOR IOWA VEGETATION DATA CAN01230 

c CAN01240 

C DATA NST/514,41, 517,36,32, 621,517,45, 610,485,21, CAN01250 

C +437,377,22, 190,172,25, 650,568,42, 435,417,44, 393,267/ CAN01260 

c CAN01270 

c CAN01280 

C 11 = CLASS STATISTICS; 12 = CANONICAL FEATURES CAN01290 

c CAN01300 

OPEN (11) CAN01310 

OPEN (12) CAN01320 

REWIND 11 CAN01330 

REWIND 12 CAN01340 

CANO 13 50 

SET THE INPUT&OUTPUT DO LOOP PARAMETERS CAN01360 

CAN01370 

IK1=M0D (NCLS, 6) CAN01380 

IM1=6* (NCLS/ 6) +1 CAN01390 

IK2=MOD (NTERM, 5) CAN01400 

IM2=5* (NTERM/5) +1 CAN01410 

IK3=MOD (NTERM, 16) CAN01420 

IM3=16* (NTERM/16) +1 CAN01430 

ILPl=NCLS/6 CAN01440 

IF (ILP1 . EQ . 0 ) ILP1=1 CAN01450 

ILP2=NTERM/5 CAN01460 

IF (ILP2 .EQ. 0) ILP2=1 CAN01470 

ILP3=NTERM/1 6 CAN01480 

IF (ILP3 .EQ. 0) ILP3=1 CAN01490 

CAN01500 

SET THE IMSL INPUT&OUTPUT TO THE FEATURE DESIGNER ( SCREEN ) CAN01510 



130 


CALL UGEtlO (IOPT, NIN, NOUT) 

DO 130 LTERM=1 , NTERM 
KTERM=LTERM* (LTERM+1) / 2 
C 

C READ IN CLASS STATISTICS 
C 

DO 30 ITERM=1 , LTERM 
IF (NCLS . LT . 6 ) GO TO 20 
DO 10 IL=1, ILP1 

10 READ (11,*) (XMT (ITERM, JCLS) , JCLS=1+ (IL-1) *6, IL*6) 

IF (IK1 . EQ. 0) GO TO 30 

20 READ (11,*) (XMT (ITERM, JCLS) ,JCLS=IM1, NCLS) 

30 CONTINUE 

DO 60 ITERM=1 , KTERM 
IF (NCLS . LT . 6 ) GO TO 50 
DO 40 IL=1, ILP1 

40 READ (11,*) (VCVT (ITERM, JCLS) , JCLS=1+ (IL-1) *6,IL*6) 
IF ( IK1 . EQ . 0 ) GO TO 60 

50 READ (11,*) (VCVT (ITERM, JCLS) ,JCLS=IM1, NCLS) 

60 CONTINUE 

IF (NCLS . LT . 6 ) GO TO 80 
DO 70 IL=1, ILP1 

70 READ (11,*) (CT (ICLS) , ICLS=1+ (IL-1) *6, IL*6) 

IF (IK1 .EQ. 0) GO TO 90 
80 READ (11,*) (CT(ICLS) , ICLS=IMl, NCLS) 

90 DO 120 ITERM=1, KTERM 
IF (NCLS. LT. 6) GO TO 110 
DO 100 IL=1, ILP1 

100 READ (11,*) (VCVIT (ITERM, JCLS) ,JCLS=1+ (IL-1) *6, IL*6) 
IF (IK1 . EQ. 0) GO TO 120 

110 READ (11,*) (VCVIT (ITERM, JCLS) ,JCLS=IM1, NCLS) 

120 CONTINUE 
130 CONTINUE 
C 

C FIND WITHIN CLASS SCATTER MATRIX 
C 

CALL FWCS (VCVT, MTERM,NST, NCLS, WCS) 

C CALL USWSMC WCS MATRIX IS ',15, WCS, NTERM, 1) 

C 

C FIND AMONG CLASS SCATTER MATRIX 
C 

CALL FACS(XMT, NTERM, MTERM,NST, NCLS, ACS, XMO) 

C C ALL USWSMC ACS MATRIX IS ', 15, ACS, NTERM, 1) 

C 

c FIND CANONICAL FEATURES 

C 

CALL EIGZS (ACS, WCS, NTERM, I JOB, D, Z, NTERM, WK, IER) 

C CALL USWFV( 'CANONIC EVALUES ', 15, D, NTERM, 1, 1) 

C CALL USWSM (' CANONIC EVECTOR' , 1 5, Z , NTERM, 1 ) 

C 

C INTERNAL CHECKING FOR MATRIX INVERSION ACCURACY 

C 

CALL SCOPY (MTERM, WCS, 1,WCS1, 1) 

CALL LINV1P (WCS1 , NTERM, WCSI , IDGT, D1 , D2 , IER1 ) 

CALL VMULSS (WCSI, ACS, NTERM, TEST, NTERM) 

CALL FTRACE (TEST, NTERM, TRACE) 


CAN01520 

CAN01530 

CAN01540 

CANO 1550 

CAN01560 

CAN01570 

CAN01580 

CAN01590 

CAN01600 

CANO 1610 

CAN01620 

CAN01630 

CAN01640 

CANO 1650 

CAN01660 

CAN01670 

CAN01680 

CANO 1690 

CAN01700 

CAN01710 

CAN01720 

CAN01730 

CAN01740 

CAN01750 

CANO 17 60 

CAN01770 

CAN01780 

CAN01790 

CAN01800 

CAN01810 

CAN01820 

CAN01830 

CAN01840 

CAN01850 

CAN01860 

CAN01870 

CAN01880 

CAN01890 

CAN01900 

CAN01910 

CAN01920 

CAN01930 

CAN01940 

CAN01950 

CAN01960 

CAN01970 

CAN01980 

CAN01990 

CAN02000 

CAN02010 

CAN02020 

CAN02030 

CAN02040 

CAN02050 

CAN02060 

CAN02070 

CAN02080 
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c 

C SEND THE ACCURACY COMMENTS TO THE SCREEN 
C 

IF (IER.NE . 0 .OR.WK (1) .GE.l.OJGO TO 140 
WRITE ( * , 3 ) IER, WK ( 1 ) 

GO TO 150 

140 WRITE ( * , 2 ) IER, WK ( 1 ) 

1 FORMAT (5E15. 7) 

2 FORMAT (' PERFORMANCE OF "EIGZS" IS POOR, IER =',I5, 

+ '; WK (1) =’ ,E15. 7) 

3 FORMAT (' PERFORMANCE OF "EIGZS" IS GOOD, IER =',I5, 

+'; WK (1) =' ,E15. 7) 

150 DO 170 I=1,NTERM 

IF(D(I) .LE.0.0)GO TO 160 
GO TO 170 

160 WRITE (*,4)I,D(I) 

4 FORMAT (' EIGEN VALUE IS "< = 0.0" AT I =', 15, 

+' WHERE D (I) =' , E15 . 7) 

IFLAG1=IFLAG1+1 
170 CONTINUE 

IF (IFLAG1 . GT . 0 ) GO TO 180 
WRITE (*, 6) 

GO TO 190 

180 WRITE (*, 5) IFLAG1 

5 FORMAT (' THERE ARE', 15, ' NEGATIVE OR ZERO EIGEN VALUES 

6 FORMAT (' ALL EIGEN VALUES ARE GREATER THAN ZERO ') 

190 CALL VABSMF (D,NTERM, 1, SUM) 

IF (ABS (TRACE-SUM) .GT.1.0E-1 ) GO TO 200 
WRITE (*, 8) TRACE, SUM 
GO TO 210 

200 WRITE (*, 7) TRACE, SUM 

7 FORMAT (' ACCURACY OF "EIGZS" IS POOR, TRACE =',E15.7, 
+'; SUM =' ,E15. 7) 

8 FORMAT ( ' ACCURACY OF "EIGZS" IS GOOD, TRACE =',E15.7, 

+ '; SUM =' ,E15 . 7) 

C 

C SEND THE FINAL RESULTS TO THE CANONICAL FEATURE FILE 
C 

210 WRITE (12, 9) TRACE, SUM 

9 FORMAT (2E15. 7) 

IF (NTERM.LT. 5) GO TO 230 
DO 220 1=1, ILP2 

220 WRITE (12,1) (D (NTERM+l-J) , J=l+ (I— 1 ) *5, 1*5) 

IF (IK2 . EQ. 0) GO TO 240 
230 WRITE (12,1) (D (NTERM+l-J) , J=IM2, NTERM) 

240 DO 270 J=l, NTERM 

IF (NTERM.LT. 5) GO TO 260 
DO 250 1=1, ILP2 

250 WRITE (12,1) (Z (K, NTERM+l-J) , K=l+ (1-1) *5, 1*5) 

IF (IK2 . EQ . 0 ) GO TO 270 

260 WRITE (12,1) (Z (K, NTERM+l-J) , K=IM2, NTERM) 

270 CONTINUE 

CALL VMULSF (WCS, NTERM, Z, NTERM, NTERM, TEST, NTERM) 
NTM=NTERM 

CALL VMULFM (Z, TEST, NTM, NTM, NTM, NTM, NTM, T, NTM, IER) 

C 

C SEND THE ACCURACY COMMENTS TO THE SCREEN 


CAN02090 
CAN02100 
CAN02110 
CAN02120 
CAN02130 
CAN02140 
CAN02150 
CAN02160 
CAN02170 
CAN02180 
CAN02190 
CAN02200 
CAN02210 
CAN02220 
CAN02230 
CAN02240 
CAN02250 
CAN02260 
CAN02270 
CAN02280 
CAN02290 
CAN02300 
CAN02310 
CAN02320 
CAN02330 
CAN02340 
CAN02350 
CAN02360 
CAN02370 
CAN02380 
CAN02390 
CAN02400 
CAN02410 
CAN02420 
CAN02430 
CAN02440 
CAN02450 
CAN02460 
CAN02470 
CAN02480 
CAN02490 
CAN02500 
CAN02510 
CAN02520 
CAN02530 
CAN02540 
CAN02550 
CAN02560 
CANO 2 570 
CAN02580 
CAN02590 
CAN02600 
CAN02610 
CAN02620 
CAN02630 
CAN02640 
CANO 2 650 
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PRINT*, 1 THE FOLLOWING MATRIX MUST BE AN IDENTITY MATRIX' 
IF (NTERM.LT. 16) GO TO 290 
DO 280 IL=1, ILP3 
DO 280 1=1 , NTERM 

280 WRITE (*, 281) (T (I, J) , J=l+ (IL-1) *16, IL*16) 

281 FORMAT (16F5. 2) 

IF (IK3 .EQ. 0) GO TO 310 
290 DO 300 1=1, NTERM 
300 WRITE (*,281) (T (I, J) , J=IM3, NTERM) 

310 STOP 
END 

SUBROUTINE FWCS (VCVT,MTERM, NST, NCLS, WCS) 

C 

C THIS SUBROUTINE FINDS WITHIN CLASS SCATTER MATRIX 
C 

REAL VCVT (MTERM, NCLS) , WCS (MTERM) 

INTEGER NST (NCLS) 

NX1=0 

DO 10 1=1, NCLS 
10 NX1=NX1+NST(I) 

DO 30 1=1, MTERM 
X1=0 . 0 

DO 20 J=1,NCLS 
X2=FLOAT (NST ( J) ) -1 . 0 
20 Xl=Xl+X2*VCVT (I, J) /FLOAT (NX1) 

30 WCS(I)=X1 
RETURN 
END 

Subroutine facs (xmt, nterm, mterm, nst, ncls, acs, xmoj 
c 

C THIS SUBROUTINE FINDS AMONG CLASS SCATTER MATRIX 
C 

REAL XMT (NTERM, NCLS) , ACS (MTERM) , XMO (NTERM) 

INTEGER NST (NCLS) 

NX1=0 

DO 10 1=1, NCLS 
10 NX1=NX1+NST (I) 

DO 30 1=1, NTERM 
X1=0 . 0 

DO 20 J=l, NCLS 
X2=FLOAT (NST ( J) ) 

20 X1=X1+X2*XMT (I, J) /FLOAT (NX1) 

30 XMO(I)=Xl 

DO 50 I=1,NTEFM 
DO 50 J=1,I 
IND= (1-1) *1/2+ J 
X1=0 . 0 

DO 40 ICLS=1, NCLS 

X2=FLOAT (NST (ICLS) ) /FLOAT (NX1) 

40 X1=X1+X2* (XMT (I, ICLS) -XMO (I) ) * (XMT ( J, ICLS) -XMO(J) ) 

50 ACS (IND) =X1 
RETURN 
END 

SUBROUTINE FTRACE (TEST, NTERM, TRACE) 

REAL TEST (NTERM, NTERM) 

TRACE=0 . 0 


CAN02660 
CAN02670 
CAN02680 
CAN02690 
CAN02700 
CAN02710 
CAN02720 
CAN02730 
CAN02740 
CAN02750 
CAN02760 
CAN02770 
CAN02780 
CAN02790 
CAN02800 
CAN02810 
CAN02820 
CAN02830 
CAN02840 
CAN02850 
CAN02860 
CAN02870 
CAN02880 
CAN02890 
CAN02900 
CAN02910 
CAN02920 
CAN02930 
CAN02940 
CAN02950 
CAN02960 
CAN02970 
CAN02980 
CAN02990 
CAN03000 
CAN03010 
CAN03020 
CANO 30 30 
CAN03040 
CAN03050 
CAN03060 
CAN03070 
CAN03080 
CANO 30 90 
CAN03100 
CAN03110 
CANO 31 20 
CAN03130 
CAN03140 
CAN03150 
CANO 31 60 
CANO 31 70 
CAN03180 
CAN03190 
CAN03200 
CAN03210 
CANO 3220 
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DO 10 1=1, NTERM 
10 TRACE=TRACE+TEST(I,I) 
RETURN 
END 


CANO 32 30 
CANO 32 40 
CAN03250 
CANO 32 60 


C 

c 

c 

c- 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 


c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 


PROGRAM PCFIND 

PARAMETER (NTSET=4,NTERM=1 6, MTERM=NTERM* (NTERM+1) /2,NCLS=3, 
+NSET=1,MSET=1, NDSET=1, NSMAX=100, NZl=NCLS*NCLS*NTERM, 

+IRES=0, IFIND=1, ICKMV=0, NDTRM=1, NZ2=NCLS*NTERM,NTERMC=15) 

IFIND = 1 > NDTRM CONTROL : LTERM=1, NTERM, NDTRM 

IFIND = 0 > NDTRM DISABLE : LTERM = NTERM ONLY 

-» IRES = 1 > NSMAX MUST EXCEED MAX(NST(I)) « NOTES! ! ! 

IRES = 0 > NSMAX CONTROL : SUBROUTINE GGNSM 

NTERMC > OR = NTERM , WHERE NTERMC IS USED TO READ ENTIRE 
TRANSFORMED DATA; WHILE NTERM IS USED TO DECIDE 
HOW MANY OF THEM WILL BE CONTRIBUTED TO PC 

NTERM = TOTAL NUMBER OF FEATURES USED 

NCLS = TOTAL NUMBER OF INFORMATION CLASSES 

NSMAX = PRESET MAX. NO. OF SAMPLES FOR ONE CLASS 

REAL XMT (NTERM, NCLS ) , VCVT (MTERM, NCLS) , CT (NCLS) , 

+VCVIT (MTERM, NCLS ) , TVEC (NSMAX, NTERM, NCLS) , 

+VCVIF (NTERM, NTERM) , VCV (MTERM) , VCVI (MTERM) , XM (NTERM) , 

+PC (NTERM) , QP (NCLS, NTERM) , PR (NCLS, NTERM) , PX (NCLS) , 

+VEC (NSMAX, NTERM) , WK (NTERM) , X (NTERM) , T1 (NTERM) 

REAL XMCK (NTERM) , VCVCK (MTERM) , TX (NTERM) , Y (NTERM) 

REAL RVEC (NSMAX, NTERMC, NCLS) , AP (NCLS) 

INTEGER NBR ( 6 ) , NPC (NCLS, NCLS, NTERM) , NST (NCLS) 

CHARACTER* 2 XC1 
DOUBLE PRECISION DSEED 
DATA XC1/' '/ 

DATA PC /NTERM* 0.0/ 

DATA QP, PR, PX/NZ2*0 . 0, NZ2*0 . 0, NCLS*0 . 0/ 

DATA DSEED, NPC/5. DO, NZ1*0/ 

DATA (NBR (I) , 1=4, 6) , IOPT, NIN,NOUT/l, 0, 0, 3, 0, 6/ 

XMT = MEAN VECTORS FOR ALL CLASSES 

VCVT = COV. MATRICES FOR ALL CLASSES 

CT = M.L. DECISION RULE PARAMETER 

VCVIT = INVERSE COV. MATRICES FOR ALL CLASSES 

TVEC = GENERATED SAMPLE VECTORS 

VCVIF = INVERSE COV. MATRIX IN FULL STORAGE MODE 
VCV = COV. MATRIX 

VCVI = INVERSE COV. MATRIX IN SYMMETRIC STORAGE MODE 
XM = MEAN VECTOR 

PC = PROBABILITY OF CORRECT CLASSIFICATION 

XMCK = CHECKING VECTOR FOR MEAN 

VCVCK = CHECKING MATRIX FOR COVARIANCES 

NBR = IMSL ROUTINE-USED PARAMETER VECTOR 

NPC = CLASSIFICATION RESULT MATRIX 


PCF00010 

PCF00020 

PCF00030 

PCF00040 

PCF00050 

PCF00060 

PCF00070 

PCF00080 

PCF00090 

PCF00100 

PCF00110 

PCF00120 

PCF00130 

PCF00140 

PCF00150 

PCF00160 

PCF00170 

PCF00180 

PCF00190 

PCF00200 

PCF00210 

PCF00220 

PCF00230 

PCF00240 

PCF00250 

PCF00260 

PCF00270 

PCF00280 

PCF00290 

PCF00300 

PCF00310 

PCF00320 

PCF00330 

PCF00340 

PCF00350 

PCF00360 

PCF00370 

PCF00380 

PCF00390 

PCF00400 

PCF00410 

PCF00420 

PCF00430 

PCF00440 

PCF00450 

PCF00460 

PCF00470 

PCF00480 
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q NST = STORE THE TOTAL NO. OF SAMPLES FOR EACH CLASS 

C 

c 

C 

C »CHOOSE OR TYPE IN THE CORRECT NUMBERS OF SAMPLES IN THE DATA SETS 

C 


c 

NSET 

FI 

NP2 

c 

1 

M2611K1 

832 

c 

2 

M2611K2 

1551 

c 

3 

M2611K3 

1477 

c 

4 

M2614N1 

1265 

c 

5 

M2614N2 

1239 

c 

6 

M2614N3 

1444 


C 

C 

C 

C 

C 

C 

C 

C 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 


A 

WW: 141 
WW: 658 
WW: 677 
SW: 664 
SW: 787 
SW: 931 


B 

SF: 414 
SF : 211 
SF: 643 
SF : 437 
SF : 291 
SF: 330 


C 

GS : 277 
UC: 682 
GS : 157 
NP : 164 
NP : 161 
NP : 183 


DACO 

760928 

770503 

770626 

770508 

770629 

770804 


EXNU 

76102207 

77102207 

77102207 

77102217 

77102217 

77102217 


RUSE 

1-1622 

6515-8096 

8097-9691 

1-1396 

2777-4141 

5426-6993 


DATA NST/141, 414, 277, 658,211, 682, 677, 643, 157/ 
DATA NST/141, 414,277, 658, 211, 682, 677, 643, 157, 
+664,437,164,787,291,161,931,330,183/ 

DATA NST/664, 437,164,787, 291, 161, 931, 330, 183/ 
DATA NST/141, 414,277, 658, 211,682, 677/ 

DATA NST/587, 216, 121/ 

DATA NST/658, 211, 682/ 


THE FOLLOWING DATA 1 NST' ARE USED FOR SOIL ORDER DATA SET. 'SO' 

NP2=479; MOL ALF EN AR UL IN SP VE H OX UNCLASSIFIED 
DATA NST/154, 113, 78, 52, 45, 37, 30, 11, 8, 11, 32/ 

DATA NST/154, 113, 78, 52, 45, 97/ 

DATA NST/154, 113, 78, 52, 45, 37/ 


THE FOLLOWING DATA 'NST' IS USED FOR SOIL 'OM1' DATA SET 
I.E. (1) MOLLISOL, OR (2) ALFISOL, AND GROUP SAMPLES 
ACCORDING TO THEIR ORGANIC MATERIAL: % WEIGHT 
CLASS 1 TO 6 : NP2 = 255 

CLS1 : .11% .GE. OM .LE. 1.5% 

CLS2 : 1.5% .GT. OM .LE. 2.0% 

CLS3 : 2.0% .GT. OM .LE. 2.5% 

CLS4 : 2.5% -GT. OM .LE. 3.5% 

CLS5 : 3.5% .GT. OM .LE. 5.0% 

CLS6 : 5.0% .GT. OM .'LE. 10.12% 


# 


1 -> # 51 

# 52 -> # 104 

# 105 -> # 138 

# 139 -> # 183 

# 184 -> # 222 

# 223 -> # 255 


DATA NST/51, 54, 33, 45, 39, 33/ 


DATA ' S2A' : ANOTHER TEST GROUPED BY THE SAME OM RANGES AS 'OM2' 

OM PERCENTAGE : 0,1; 1,2; 2,3; 3,4; 4,6; 6 AND ABOVE 

DATA NST/26, 78, 64, 32, 55/ 


THE FOLLOWING DATA 'NST' IS USED FOR 'OM2' DATA SET 
ACCORDING TO THEIR ORGANIC MATERIAL: % WEIGHT 
CLASS 1 TO 6 : NP2 = 514 

CLS1 : .08% .GE. OM .LE. 1.0% : # 1 -> # 82 


PCF00490 

PCF00500 

■PCF00510 

PCF00520 

PCF00530 

PCF00540 

-PCF00550 

PCF00560 

PCF00570 

PCF00580 

PCF00590 

PCF00600 

PCF00610 

PCF00620 

PCF00630 

PCF00640 

PCF00650 

PCF00660 

PCF00670 

PCF00680 

PCF00690 

PCF00700 

PCF00710 

-PCF00720 

PCF00730 

PCF00740 

PCF00750 

PCF00760 

PCF00770 

PCF00780 

PCF00790 

-PCF00800 

PCF00810 

PCF00820 

PCF00830 

PCF00840 

PCF00850 

PCF00860 

PCF00870 

PCF00880 

PCF00890 

PCF00900 

PCF00910 

PCF00920 

PCF00930 

PCF00940 

PCF00950 

PCF00960 

PCF00970 

PCF00980 

PCF00990 

PCF01000 

PCF01010 

PCF01020 

PCF01030 

PCF01040 

PCF01050 
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C CLS2 : 1.0% .GT. OM .LE. 2.0% : # 83 -> # 217 PCF01060 

C CLS3 : 2.0% .GT. OM .LE. 3.0% : # 218 -> # 337 PCF01070 

C CLS4 : 3.0% .GT. OM .LE. 4.0% : # 338 -> # 391 PCF01080 

C CLS5 : 4.0% .GT. OM .LE. 6.0% : # 392 -> # 450 PCF01090 

C CLS6 : 6.0% .GT. OM .LE. 84.79% : # 451 -> # 514 PCF01100 

C PCF01110 

C DATA NST/82, 135, 120, 54, 59, 64/ PCF01120 

C PCF01130 

C DATA NST/82, 135, 120, 54, 123/ PCF01140 

C DATA NST/44, 31, 18, 23, 24, 51, 37,27/ PCF01150 

C DATA NST/83, 57, 94, 31, 37, 59, 103, 26, 24/ PCF01160 

C DATA NST/103, 26,24/ PCF01170 

c _____ PCF01180 

C THE FOLLOWING DATA 1 NST' IS USED FOR SOIL IRON OXIDE 'IO' DATA SETPCF01190 
C ACCORDING TO THEIR FE203 % WEIGHT PCF01200 

C CLASS 1 TO 6 : NP2 =467 PCF01210 

C CLS1 : .02% .GE. FE203 .LE. 0.4% : # 1 -> # 102 PCF01220 

C CLS2 : 0.4% .GT. FE203 .LE. 0.6% : # 103 -> # 175 PCF01230 

C CLS3 : 0.6% .GT. FE203 .LE. 0.8% : # 176 -> # 244 PCF01240 

C CLS4 : 0.8% .GT. FE203 .LE. 1.2% : # 245 -> # 349 PCF01250 

C CLS5 : 1.2% .GT. FE203 .LE. 1.6% : # 350 -> # 401 PCF01260 

C CLS6 : 1.6% .GT. FE203 .LE. 25.60% : # 402 -> # 467 PCF01270 

C PCF01280 

C DATA NST/102, 73, 69, 105, 52, 66/ PCF01290 

C PCF01300 

c PCF01310 

C THE FOLLOWING DATA 'NST' IS USED FOR SOIL TEXTURE 'ST' DATA SET PCF01320 

C ACCORDING TO THEIR SAND-SILT-CLAY % CONTENT PCF01330 

C CLASS 1 TO 6 : NP2 = 483; DETAILS : SEE FILE ( S5L.DATA.C1) PCF01340 

C PCF01350 

C DATA NST/40, 63, 76, 93, 68, 143/ PCF01360 

C PCF01370 

c — PCF01380 

C THE FOLLOWING DATA 'NST' IS USED FOR S.D. VEGETATION DATA PCF01390 

C PCF01400 

C DATA NST/225, 61, 292, 469, 82,182,63,103, 39,39,217,51, PCF01410 

C +393,441,80,88, 88,41,32,26, 118,43,121,44, 45,102,66,89, PCF01420 

C +78,53,147,39, 24,42,119,69, 76,96,107,154, 28,19/ PCF01430 

C PCF01440 

c PCF01450 

C THE FOLLOWING DATA 'NST' IS USED FOR IOWA VEGETATION DATA PCF01460 

C PCF01470 

C DATA NST/ 514, 41, 517,36,32, 621,517,45, 610,485,21, PCF01480 

C +437,377,22, 190,172,25, 650,568,42, 435,417,44, 393,267/ PCF01490 

C PCF01500 

c : PCF01510 

C PCF01520 

C 11 = TRANSFORMED DATA; 12 = CLASS STATISCTICS; 13 = PC PCF01530 

C PCF01540 

OPEN (11) PCF01550 

OPEN (12) PCF01560 

OPEN (13) PCF01570 

REWIND 11 PCF01580 

REWIND 12 PCF01590 

REWIND 13 PCF01600 

NX2=0 PCF01610 

DO 1 I=1,NCLS PCF01620 
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1 NX2=NX2+NST (I) 

IF(IRES.EQ.O)GO TO 3 
DO 2 1=1, NCLS 
NX1=NST (I) 

2 AP (I) =FLOAT (NX1 ) /FLOAT (NX2 ) 

GO TO 5 

3 DO 4 1=1, NCLS 

4 AP (I) =1.0 /FLOAT (NCLS) 

5 IK=MOD (NCLS, 6) 

C 

C SET THE INPUT&OUTPUT DO LOOP PARAMETERS 
C 

IM=6* (NCLS/ 6) +1 
IKl=MOD (NCLS, 3) 

IM1=3* (NCLS/3) +1 
IK2=MOD (NCLS, 15) 

IM2=15* (NCLS/15) +1 

ILPl=NCLS/6 

IF (ILP1 . EQ. 0) ILP1=1 

ILP2=NCLS/3 

IF (ILP2 , EQ. 0) ILP2=1 

ILP3=NCLS/15 

IF (ILP3.EQ, 0) ILP3=1 

IF (IRES . EQ . 0 ) NSAMP=NSMAX 

DO 5_50 ISET=NSET, MSET, NDSET 

IF (IRES . EQ. 1 ) CALL RDATA (ISET, RVEC, NSMAX, NTERMC, NCLS, NST) 
C 

C READ IN CLASS STATISTICS 
C 

DO 500 LTERM=1 , NTERM 
KTERM=LTERM* (LTERM+1) /2 
DO 30 ITERM=1 , LTERM 
IF (NCLS. LT. 6) GO TO 20 
DO 10 IL=1, ILP1 

10 READ (12,*) (XMT (ITERM, JCLS) , JCLS=1+ (IL-1) *6, IL*6) 

IF (IK.EQ. 0) GO TO 30 

20 READ (12,*) (XMT (ITERM, JCLS), JCLS=IM, NCLS) 

30 CONTINUE 

DO 60 ITERM=1 , KTERM 
IF (NCLS. LT. 6) GO TO 50 
DO 40 IL=1, ILP1 

40 READ (12,*) (VCVT (ITERM, JCLS) , JCLS=1+ (IL-1 ) *6, IL*6) 

IF (IK.EQ. 0) GO TO 60 

50 READ (12,*) (VCVT (ITERM, JCLS) , JCLS=IM, NCLS) 

60 CONTINUE 

IF (NCLS . LT , 6 ) GO TO 80 
DO 70 IL=1 , ILP1 

70 READ (12,*) (CT (ICLS) , ICLS=1+ (IL-1) *6, IL*6) 

IF (IK.EQ. 0) GO TO 90 

80 READ (12,*) (CT (ICLS) , ICLS=IM, NCLS) 

90 DO 120 ITERM=1, KTERM 
IF (NCLS . LT . 6) GO TO 110 
DO 100 IL=1, ILP1 

100 READ (12,*) (VCVIT (ITERM, JCLS) , JCLS=1+ (IL-1) *6, IL*6) 

IF (IK.EQ. 0) GO TO 120 

110 READ (12,*) (VCVIT (ITERM, JCLS) , JCLS=IM, NCLS) 

120 CONTINUE 


PCF01630 

PCF01640 

PCF01650 

PCF01660 

PCF01670 

PCF01680 

PCF01690 

PCF01700 

PCF01710 

PCF01720 

PCF01730 

PCF01740 

PCF01750 

PCF01760 

PCF01770 

PCF01780 

PCF01790 

PCF01800 

PCF01810 

PCF01820 

PCF01830 

PCF01840 

PCF01850 

PCF01860 

PCF01870 

PCF01880 

PCF01890 

PCF01900 

PCF01910 

PCF01920 

PCF01930 

PCF01940 

PCF01950 

PCF01960 

PCF01970 

PCF01980 

PCF01990 

PCF02000 

PCF02010 

PCF02020 

PCF02030 

PCF02040 

PCF02050 

PCF02060 

PCF02070 

PCF02080 

PCF02090 

PCF02100 

PCF02110 

PCF02120 

PCF02130 

PCF02140 

PCF02150 

PCF02160 

PCF02170 

PCF02180 

PCF02190 



oooooo noon o ooo non non o non 
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IF (IFIND .EQ. 1) GO TO 125 
IF (LTERM . NE . NTERM) GO TO 500 
IF (IFIND . EQ. 0) GO TO 128 

FIND THE PC RESULTS FOR EVERY DTERM INCREMENT 

125 NX1=LTERM+ (NDTRM-1 ) 

NX2=MOD (NX1 , NDTRM) 

PRINT*, NX1,NX2 
IF (NX2 . NE . 0 ) GO TO 500 
128 DO 170 JCLS=1 , NCLS 
DO 130 1=1, KTERM 
130 VCV ( I ) =VCVT ( I , JCLS ) 

CALL UGETIO (IOPT, NIN,NOUT) 

CALL USWSM ( ' THE MATRIX IS ' , 15, VCV, LTERM, 1) 

NOTE : WK (1 ) MUST BE 0.0 EVERY TIME TO INITIALIZE ' GGNSM ' 
NOTE : VCV WILL BE CHANGED AFTER ' GGNSM' 

IF (IRES . EQ. 1) NSAMP=NST (JCLS) 

IF (IRES . EQ. 1) GO TO 145 
DO 140 1=1, NTERM 
140 WK(I) =0 . 0 
DSEED=5 . DO 

GENERATE GAUSSIAN SAMPLES ACCORDING TO THE CLASS STATISTICS 

CALL GGNSM (DSEED, NSAMP, LTERM, VCV, NSMAX, VEC, WK, IER) 

145 DO 155 1=1, NSAMP 
DO 155 J=l, LTERM 
IF ( IRES . EQ . 1 ) GO TO 150 
VEC (I, J) =VEC (I, J) +XMT ( J, JCLS) 

STORE THE SAMPLES INTO ARRAY 'TVEC' 

TVEC ( I , J, JCLS ) =VEC ( I , J) 

GO TO 155 

150 TVEC (I, J, JCLS) =RVEC (I, J, JCLS) 

VEC (I, J) =RVEC (I, J, JCLS) 

PRINT*, JCLS, I, J, TVEC (I, J, JCLS) 

155 CONTINUE 

IF (ICKMV.EQ. 0) GO TO 170 

CHECK THE MEAN VECTOR AND COV. MATRIX OF THE GENERATED SAMPLES 
THE MATRIX 'VEC' WILL BE CHANGED AFTER ' BECOVM ' 

DO 160 1=1, NTERM 
160 TX (I) =0 . 0 

NBR ( 1 ) =LTERM 
NBR ( 2 ) =NSAMP 
NBR (3) =NSAMP 
IF (LTERM . GT . 1 ) GO TO 600 

CALL BECOVM (VEC, NSMAX, NBR, TX, XMCK, VCVCK, IER) 

SEND THE CHECKING RESULTS TO THE SCREEN IF NEEDED 

CALL USWFVC THE VECTOR IS ', 15, XMCK, LTERM, 1, 1) 

CALL USWSM (' THE MATRIX IS ', 15, VCVCK, LTERM, 1) 

170 CONTINUE 


PCF02200 

PCF02210 

PCF02220 

PCF02230 

PCF02240 

PCF02250 

PCF02260 

PCF02270 

PCF02280 

PCF02290 

PCF02300 

PCF02310 

PCF02320 

PCF02330 

PCF02340 

PCF02350 

PCF02360 

PCF02370 

PCF02380 

PCF02390 

PCF02400 

PCF02410 

PCF02420 

PCF02430 

PCF02440 

PCF02450 

PCF02460 

PCF02470 

PCF02480 

PCF02490 

PCF02500 

PCF02510 

PCF02520 

PCF02530 

PCF02540 

PCF02550 

PCF02560 

PCF02570 

PCF02580 

PCF02590 

PCF02600 

PCF02610 

PCF02620 

PCF02630 

PCF02640 

PCF02650 

PCF02660 

PCF02670 

PCF02680 

PCF02690 

PCF02700 

PCF02710 

PCF02720 

PCF02730 

PCF02740 

PCF02750 

PCF02760 
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C START CLASSIFICATION JOB FOR EACH CLASS SAMPLES 

C 

DO 230 JCLS=1 , NCLS 

IF (IRES . EQ . 1 ) NSAMP=NST ( JCLS ) 

PRINT* , LTERM, JCLS , NSAMP 
DO 230 ISAMP=1, NSAMP 
DO 180 J=l, LTERM 
180 Y (J)=TVEC(ISAMP, J, JCLS) 

DO 220 KCLS=1 , NCLS 

C THE FOLLOWING IS NEEDED SINCE X HAS BEEN CHANGED FOR EVERY KCLS! 
DO 190 1=1, LTERM 
190 X (I) =Y (I) 

DO 200 I=1,KTERM 
200 VCVI (I) =VCVIT (I, KCLS) 

CALL VCVTSF (VCVI , LTERM, VCVIF , NTERM) 

DO 210 1=1, LTERM 
210 XM(I)=XMT (I, KCLS) 

CALL SAXPY (LTERM, -1 . , XM, 1, X, 1 ) 

CALL VMULFM (X, VCVIF, LTERM, 1 , LTERM, NTERM, NTERM, Tl, 1, IER) 

CALL VMULFF (Tl , X, 1 , LTERM, 1,1, NTERM, T2 , 1 , IER) 

T3=EXP (-0. 5*T2) 

220 PX (KCLS) =AP (KCLS) *CT (KCLS) *T3 
C 

C PERFORM M.L. DECISION RULE 

C 

CALL VABMXF(PX(1) , NCLS, 1,IMAX, BIG) 

NPC (JCLS, IMAX, LTERM) =NPC (JCLS, IMAX, LTERM) +1 
C CALL VABSMF(PX, NCLS, 1, DEN) 

C Q=BIG/DEN 

C WRITE (13, *) JCLS, ISAMP, IMAX, NPC (JCLS, IMAX, LTERM) 

C WRITE (13,*) (PX(I) ,1=1, NCLS) , IMAX, BIG 

230 CONTINUE 
C 

C FIND PROBABILITY OF CORRECT CLASSIFICATION PC FROM NPC 

C 

NC1=0 

NC2=0 

DO 240 1=1, NCLS 

IF (IRES . EQ . 0 ) NST ( I ) =NSMAX 

PR (I, LTERM) = (FLOAT (NPC (1,1, LTERM) ) ) /FLOAT (NST(I) ) 

NC1=NC1+NPC (I, I, LTERM) 

240 NC2=NC2+NST (I) 

IF (IRES .EQ. 0) NC2=NSMAX*NCLS 
PC (LTERM) = (FLOAT (NCI ) ) /FLOAT (NC2 ) 

IF (NCLS . LT . 3) GO TO 260 
C 

C SEND THE RESULTS TO THE SCREEN 

C 

DO 250 IL=1, ILP2 

250 WRITE (*,*) ISET, LTERM, (PR (I, LTERM) , 1=1+ (IL-1 ) *3, IL*3) 

IF (IK1 . EQ . 0 ) GO TO 270 

260 WRITE (*,*) ISET, LTERM, (PR (I, LTERM) , I=IMl, NCLS) 

270 PRINT* , ISET, LTERM, PC (LTERM) 

C 

C SEND THE RESULTS TO THE PC FILE 
C 


PCF02770 

PCF02780 

PCF02790 

PCF02800 

PCF02810 

PCF02820 

PCF02830 

PCF02840 

PCF02850 

PCF02860 

PCF02870 

PCF02880 

PCF02890 

PCF02900 

PCF02910 

PCF02920 

PCF02930 

PCF02940 

PCF02950 

PCF02960 

PCF02970 

PCF02980 

PCF02990 

PCF03000 

PCF03010 

PCF03020 

PCF03030 

PCF03040 

PCF03050 

PCF03060 

PCF03070 

PCF03080 

PCF03090 

PCF03100 

PCF03110 

PCF03120 

PCF03130 

PCF03140 

PCF03150 

PCF03160 

PCF03170 

PCF03180 

PCF03190 

PCF03200 

PCF03210 

PCF03220 

PCF03230 

PCF03240 

PCF03250 

PCF03260 

PCF03270 

PCF03280 

PCF03290 

PCF03300 

PCF03310 

PCF03320 

PCF03330 
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WRITE (13, *) ' LTERM = ' , LTERM 

PCF03340 


IF (NCLS . LT . 6 ) GO TO 290 

PCF03350 


DO 280 IL=1, ILP1 

PCF03360 

280 

WRITE (13, 301) (PR (I, LTERM) , 1=1+ (IL-1) *6, IL*6) 

PCF03370 


IF (IK. EQ. 0) GO TO 300 

PCF03380 

290 

WRITE (13, 301) (PR (I, LTERM) , I=IM,NCLS) 

PCF03390 

300 

WRITE (13, 301 ) PC (LTERM) 

PCF03400 

301 

FORMAT ( 6F13 . 5 ) 

PCF03410 

PCF03420 

< 

RESET ALL RELATED VARIABLES > 

PCF03430 


THE FOLLOWING ZEROING PROCEDURES ARE 'ABSOLUTELY' NEEDED!! 

PCF03440 


THIS IS DONE FOR EVERY " LTERM = 1, NTERM " 

PCF03450 

PCF03460 


DO 310 K=1 , NCLS 

PCF03470 


DO 310 1=1 , NSMAX 

PCF03480 


DO 310 J=l, NTERM 

PCF03490 

310 

TVEC (I, J, K) =0 . 0 

PCF03500 


DO 320 1=1, NCLS 

PCF03510 


DO 320 J=l, NTERM 

PCF03520 


QP (I, J) =0 . 0 

PCF03530 

320 

PR(I, J) =0 . 0 

PCF03540 


DO 330 1=1 , NTERM 

PCF03550 

330 

PC(I)=0.0 

PCF03560 


IF (NCLS.LT. 15) GO TO 360 

PCF03570 

PCF03580 


SEND THE FINAL CLASSIFICATION MATRIX NPC TO THE PC FILE 

PCF03590 

PCF03600 


DO 350 J=l, ILP3 

PCF03610 


DO 340 1=1, NCLS 

PCF03620 

340 

WRITE (13, 341) I, (NPC (I, K, LTERM) , K=l+ ( J-l) *15, J*15) 

PCF03630 

341 

FORMAT (13, 2X, 1515) 

PCF03640 


WRITE (13, 342) XC1 

PCF03650 

342 

FORMAT (A2) 

PCF03660 

350 

CONTINUE 

PCF03670 


IF (IK2 .EQ. 0) GO TO 500 

PCF03680 

360 

DO 370 1=1, NCLS 

PCF03690 

370 

WRITE (13, 341) I, (NPC (I, K, LTERM) ,K=IM2, NCLS) 

PCF03700 

500 

CONTINUE 

PCF03710 


DO 510 1=1, NCLS 

PCF03720 


DO 510 J=1 , NCLS 

PCF03730 


DO 510 K=l, NTERM 

PCF03740 

510 

NPC (I, J, K) =0 

PCF03750 

550 

CONTINUE 

PCF03760 

PCF03770 


THE FOLLOWING STATEMENT IS USED FOR INTERNAL CHECKING 

PCF03780 

PCF03790 

600 

STOP 

PCF03800 


STOP 

PCF03810 


END 

PCF03820 


SUBROUTINE RDATA (LSET, RVEC, NSMAX, NTERMC, NCLS, NST) 

PCF03830 


REAL RVEC (NSMAX, NTERMC, NCLS) 

PCF03840 


INTEGER NST (NCLS) 

PCF03850 


IKX=MOD (NTERMC, 5) 

PCF03860 


IMX=5* (NTERMC/5) +1 

PCF03870 


ILPX=NTERMC/5 

PCF03880 


IF (ILPX. EQ. 0) ILPX=1 

PCF03890 


IFILE1=11+ (LSET-1) *10 

PCF03900 



140 


DO 40 K=1 , NCLS 
Nl=NST (K) 

PRINT*,' KCLS = 1 , K, ' ; NS AMP = ',N1 
DO 30 1=1, N1 
IF (NTERMC.LT. 5) GO TO 20 
DO 10 Jl=l , ILPX 

10 READ (IFILEl, *) (RVEC (I, J,K) , J=l+ (Jl-1) *5, Jl*5) 
IF (IKX . EQ . 0 ) GO TO 30 

20 READ (IFILEl,*) (RVEC (I, J, K) , J=IMX, NTERMC) 

30 CONTINUE 
40 CONTINUE 
RETURN 
END 


PCF03910 

PCF03920 

PCF03930 

PCF03940 

PCF03950 

PCF03960 

PCF03970 

PCF03980 

PCF03990 

PCF04000 

PCF04010 

PCF04020 

PCF04030 


